Skip to content

Option share_weight does not work #4

@tristan-ka

Description

@tristan-ka

Hi !

Thank you for sharing the code of your amazing paper.

I managed to run your code with the BabyAI environment in the setup where weights are not share between the value and the policy nets

I am now trying to run the experiment in the share_weight configuration but I run into an assertion error :

File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/models.py", line 234, in get_value assert self.share_weights, "Must be sharing weights to use get_value"
AssertionError: Must be sharing weights to use get_value

If I manually set the share_weight parameter of the policy model to True, I run into the following error:

Traceback (most recent call last):
  File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 255, in <module>
    main(params)
  File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 124, in main
    mean_reward = p.train_step()
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 473, in train_step
    surr_loss, val_loss = self.take_steps(saps)
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 436, in take_steps
    surr_loss = self.policy_step(*args).mean()
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 289, in ppo_step
    store, old_vs=batch_old_vs, opt_step=opt_step)
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 193, in value_step
    val_opt.zero_grad()
AttributeError: 'NoneType' object has no attribute 'zero_grad'

It looks like the Value Optimizer is set to None when you call value_step from inside ppo_step.

Moreover, when I look at the value_step function in the weight sharing mode, you are computing the value loss from the ppo_step on minibatches. However, it seems that when value_step is called from ppo_step you split the minibatches again into other minibatches, do the value loss computation on the first split and then return.

It seems that the value_step function is processing the data in the same way no matter the value of the share_weight parameter. This seems a bit strange as when share_weight is False, value_step operates on a whole trajectory whereas when share_weight is True, value_step takes only a batch as input.

Can you confirm that the sharing weight option is supported and works?

Thank you again for sharing!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions