You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 22, 2025. It is now read-only.
In hidden networks, Ramanujan et al. develop a method to find masks via optimization (called edge-popup). The algorithm is extremely similar to movement pruning, where the masks are part of the computational graph and receive gradients for a negative gradient step. The main difference is that they freeze the weights and only train the scores (mask), such that they can find well-performing networks within randomly initialized models.
If I freeze the weights and apply movement pruning, is it the same as the above method? If not, what would be the difference?
From a theoretical standpoint, movement pruning talks about how the method will prune those weights that move towards zero as shown by the tendency of the gradients. In edge-popup, they never mention such behavior, but I assume it would be the same if both methods apply the same operations. Given the idea that they track tendency of weights towards zero, it sounds counterintuitive to freeze the weights since there will be no movement tendency anymore. However, that's what they do in edge-popup and it works surprisingly well. Any thoughts about this?