Why clamp the z from high-level policy?

Hi,
I have a question about the clamped_actions returned from HRLPlayer.get_action() in hrl_players.py. I believe this is the latent vector z that is passed into the LLC (low-level controller or low-level policy). I checked its shape, and it was [num_env, 64], which aligns with my assumption.

However, something seems a bit unclear to me. According to the ASE paper, z is sampled from an n-dimensional hypersphere, which implies that its norm should be 1. But in the implementation, z is not normalized—instead, it’s clamped between -1 and 1. And clampped z's norm is out of 1.

Wouldn’t clamping break the original nature of z as a unit vector? It seems like this could distort its meaning or behavior. Is there a specific reason why clamping was chosen over normalization?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why clamp the z from high-level policy? #82

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why clamp the z from high-level policy? #82

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions