Skip to content

Suggestion: Include the possible environment score ranges in the main descriptions #79

@nhansendev

Description

@nhansendev

I found the table with the range of possible scores in the appendix of the paper and thought it could be a useful reference to include in a more visible location, such as on the main github page alongside the environment descriptions:

"
C. Normalization Constants
Rmin is computed by training a policy with masked out observations. This demonstrates what score is trivially achievable in each environment. Rmax is computed in several different ways. For CoinRun, Dodgeball, Miner, Jumper, Leaper, Maze, BigFish, Heist, Plunder, Ninja, and Bossfight, the maximal theoretical and practical reward is trivial to compute.

For CaveFlyer, Chaser, and Climber, we empirically determine Rmax by generating many levels and computing the average max achievable reward.

For StarPilot and FruitBot, the max practical reward is not obvious, even though it is easy to establish a theoretical bound. We choose to define Rmax in these environments as the score PPO achieves after 8 billion timesteps when trained at an 8x larger batch size than our default hyperparameters. On observing these policies, we find them very close to optimal.
"
scores

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions