Observation of MovementBandits env

Hello, I have tried to train the MLSH policies under the MovementBandits environment,
but outputs of the master policy seems to be random even after training.

The command I tried is here:
`mpirun -np 120 python3 main.py --task MovementBandits-v0 --num_subs 2 --macro_duration 10 --num_rollouts 2000 --warmup_time 9 --train_time 1 --replay False MovementBandits`

I guess the master policy has to have observation about the correct goal to select sub policies, but the current implementation provides nothing about the correct goal.
Do you have any updates about MovementBandits?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observation of MovementBandits env #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Observation of MovementBandits env #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions