Skip to content

Information theoretic analysis of sparse auto-encoders

Notifications You must be signed in to change notification settings

Abzinger/sae_analysis

Repository files navigation

sae_analysis

This toolbox provides an information-theoretic analysis (IT-analysis) of sparse-autoencoders (SAEs). The current aim is to explore whether information-theortic signatures for the size of the dictionary size can be found. We plan to extend this toobox for more information-theretic analysis of SAEs.

Toolbox Backbone

The toolbox modified the SAE training package sparsify by EleutherAI to serve as a backend for training SAEs. Additionally, it uses delphi by EleutherAI as a backend for caching activations for IT-analysis and generating explanations of SAEs latents and scoring them.

The toolbox implements IT-analysis measures such as degree of redundancy and degree of vulnerability introduced in this preprint:

  • Shannon invariants: A scalable approach to information decomposition by Aaron J. Gutknecht, Fernando E. Rosas, David A. Ehrlich, Abdullah Makkeh, Pedro A. M. Mediano, Michael Wibral

About

Information theoretic analysis of sparse auto-encoders

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages