-
Notifications
You must be signed in to change notification settings - Fork 386
Open
Description
Found a bug in epsilon_greedy() in policy_ops.py when applying legal_actions_mask. It fails when masking the action with the highest action value.
For example:
action_values = [2.0, 1.0, 1.0]
legal_actions_mask = [0., 1., 1.]
epsilon = 0.1
result = policy_ops.epsilon_greedy(action_values, epsilon, legal_actions_mask).probs
Outputs:
[0.9 0.05 0.05]
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels