-
Notifications
You must be signed in to change notification settings - Fork 153
Description
Hi,
I have a question about the clamped_actions returned from HRLPlayer.get_action() in hrl_players.py. I believe this is the latent vector z that is passed into the LLC (low-level controller or low-level policy). I checked its shape, and it was [num_env, 64], which aligns with my assumption.
However, something seems a bit unclear to me. According to the ASE paper, z is sampled from an n-dimensional hypersphere, which implies that its norm should be 1. But in the implementation, z is not normalized—instead, it’s clamped between -1 and 1. And clampped z's norm is out of 1.
Wouldn’t clamping break the original nature of z as a unit vector? It seems like this could distort its meaning or behavior. Is there a specific reason why clamping was chosen over normalization?
Thanks in advance!