peppinob-ol

peppinob-ol

Achievements

prompt_rover prompt_rover Public

Prompt Rover is an interactive tool for visualizing the latent space of an LLM. Given a prompt and its response, it builds a minimum path graph between embeddings and projects it with t-SNE to trac…

Python 7 1
attribution-graph-probing attribution-graph-probing Public

Automates attribution-graph analysis via probe prompting: circuit-trace a prompt, auto-generate concept probes, profile feature activations, cluster supernodes.

Python 1
neuron-signatures neuron-signatures Public

Neuron-signatures is a mechanistic interpretability pipeline for profiling MLP neurons via cross-prompt activation signatures and causal interventions. The method identifies stable functional roles…

Python
ARENA_2.0 ARENA_2.0 Public

Forked from callummcdougall/ARENA_2.0

Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.

HTML
circuit-tracer circuit-tracer Public

Forked from safety-research/circuit-tracer

Python
TransformerLens TransformerLens Public

Forked from TransformerLensOrg/TransformerLens

A library for mechanistic interpretability of GPT-style language models

Python