Skip to content

Use data acquired by users #701

@AdrianPrados

Description

@AdrianPrados

I am testing 'imitation' to work with proprietary environments for the robot in our Mujoco-based lab. For testing I am generating Pygame environments based on Gym. I am creating user generated data. When working with BC it is clear that those demos have to be in Trajectory format (act,obs,infos,terminal) and inside bc.BC in demonstrations (something like demosntration = exppert_trajs). When I work with Dagger, it is not very clear in the documentation how to do this process. The bc.BC is the same as a normal BC process, but in SimpleDAggerTrainer, it is not clear which expert_policy should be added. I have tried using data generated by the PPO and there it is clear that the expert_policy = expert (where expert = PPO(...), expert.learn(...)), but if I want to use human data I am not sure what I should do. Do I need to use an expert that is for example a PPO? Do I need to do a bc_trainer.learn(...) and use that result as expert_policy inside SimpleDAggerTrainer, do I need to get the expert_policy from the acquired human data, if so, how can I generate the policy from this human data?

DaggerQuestion

I have uploaded an example of a code where Arm2d is my own environment (a robotic arm in 2 dimensions), expert_trajs are 30 action state pairs data taken using PyGame and expert_policy in SimpleDAggerTrainer in this case has been tested with expert (a trained PPO with 10000 iterations). I don't know if this configuration is correct, or if as I have commented previously, it is necessary to do a BC training and use its policy as expert_policy or if I need to get the Policy from the user data (and if this is the option, how it would be done).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions