You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 26, 2025. It is now read-only.
Hello, I have tried to train the MLSH policies under the MovementBandits environment,
but outputs of the master policy seems to be random even after training.
The command I tried is here: mpirun -np 120 python3 main.py --task MovementBandits-v0 --num_subs 2 --macro_duration 10 --num_rollouts 2000 --warmup_time 9 --train_time 1 --replay False MovementBandits
I guess the master policy has to have observation about the correct goal to select sub policies, but the current implementation provides nothing about the correct goal.
Do you have any updates about MovementBandits?