Hi, I’m confused by inconsistent action-space definitions:
- Paper body (“Unified observation and action space”): 16 binary buttons + 4 joystick axes = 20 dims
- Paper appendix (A.1): action chunk a ∈ R^(16×24) (24 dims per step)
- Dataset parquet: 17 boolean button columns (includes
guide) + (j_left, j_right) each with (x,y) = 21 dims
- Code/checkpoint: model uses 25 dims = 21 button tokens + 4 joystick axes, where the extra 4 tokens are:
RIGHT_UP, RIGHT_BOTTOM, RIGHT_LEFT, RIGHT_RIGHT
Questions:
- What is the canonical per-step action dim used by NitroGen (20/21/24/25)?
- How is the dataset 21-dim action mapped to the model 25-dim action (esp. discretization of right stick into RIGHT_* tokens: thresholds/dead-zone)?