Is edge-popup the same as movement pruning with frozen weights?

In [hidden networks](https://github.com/allenai/hidden-networks), Ramanujan et al. develop a method to find masks via optimization (called **edge-popup**). The algorithm is extremely similar to movement pruning, where the masks are part of the computational graph and receive gradients for a negative gradient step. The main difference is that they freeze the weights and only train the scores (mask), such that they can find well-performing networks within randomly initialized models.

If I freeze the weights and apply movement pruning, is it the same as the above method? If not, what would be the difference? 

From a theoretical standpoint, movement pruning talks about how the method will prune those weights that move towards zero as shown by the tendency of the gradients. In edge-popup, they never mention such behavior, but I assume it would be the same if both methods apply the same operations. Given the idea that they track tendency of weights towards zero, it sounds counterintuitive to freeze the weights since there will be no movement tendency anymore. However, that's what they do in edge-popup and it works surprisingly well. Any thoughts about this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is edge-popup the same as movement pruning with frozen weights? #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is edge-popup the same as movement pruning with frozen weights? #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions