Legal actions mask bug

Found a bug in `epsilon_greedy()` in `policy_ops.py` when applying `legal_actions_mask`. It fails when masking the action with the highest action value.

For example:
```
action_values = [2.0, 1.0, 1.0]
legal_actions_mask = [0., 1., 1.]
epsilon = 0.1
result = policy_ops.epsilon_greedy(action_values, epsilon, legal_actions_mask).probs
```
Outputs:
`[0.9 0.05 0.05]`