-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Description
Hello! I am reproducing your paper results (train PPO+self-imitation, with MineCLIP reward), but fail to fill some missing details:
- How to implement the agent's 89 discrete actions as said in paper? Currently your MineAgent uses multi-discrete output 3*3*4*25*25*3, which is much larger. Did you remove some action choices?
- For computing DIRECT reward using the MineCLIP model, how to sample the negative texts and how many did you sample?
- I find the timescale of 1 step in MineDojo simulation is much smaller than 1 second in Youtube videos. Did you use the last consecutive 16 rgb observations to compute reward?
Thank you!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels