Pinned Loading
-
prompt_rover
prompt_rover PublicPrompt Rover is an interactive tool for visualizing the latent space of an LLM. Given a prompt and its response, it builds a minimum path graph between embeddings and projects it with t-SNE to trac…
-
attribution-graph-probing
attribution-graph-probing PublicAutomates attribution-graph analysis via probe prompting: circuit-trace a prompt, auto-generate concept probes, profile feature activations, cluster supernodes.
Python 1
-
neuron-signatures
neuron-signatures PublicNeuron-signatures is a mechanistic interpretability pipeline for profiling MLP neurons via cross-prompt activation signatures and causal interventions. The method identifies stable functional roles…
Python
-
ARENA_2.0
ARENA_2.0 PublicForked from callummcdougall/ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
HTML
-
-
TransformerLens
TransformerLens PublicForked from TransformerLensOrg/TransformerLens
A library for mechanistic interpretability of GPT-style language models
Python
If the problem persists, check the GitHub status page or contact support.


