Lobopy is a lightweight PyTorch/HuggingFace library for analyzing and steering the activations of causal language models. It provides an intuitive interface for computing and applying contrastive activation pathways during text generation.
With Lobopy, you can analyze how a model represents concepts, sentiments, or any other abstract idea, and then use this information to steer the model towards or away from those concepts without fine-tuning.
- Lobopy: Name of the module.
- Patient: The wrapped language model that is being analyzed and manipulated.
- Ambale: The steered model, or the act of applying steering functions to the model.
- Content: The input provided to the model (
ContentType). Lobopy handles raw strings, dictionaries, and full conversation histories. - Dataset: An iterable (e.g. list) of
ContentTypeobjects. Functions likeanalysetake adatasetto process multiple concepts at once, whereas functions likestimulateoperate on a singleContent.
You can install Lobopy directly from the repository:
git clone https://github.com/OzelTam/lobopy.git
cd lobopy
pip install -e .Or from PyPI:
pip install lobopy(Note: To stick to the project's ecosystem, you can also use uv instead of pip if you prefer!)
Here is a quick example of how you can extract a concept (like "calmness") and steer a model toward it.
from lobopy.patient import Patient, PatientConfig
from lobopy.aggregators import mean_aggregator, difference_aggregator
from lobopy.ambalefiers import (
safe_scale_activation,
top_k_layers,
normalize_path,
)
# 1. Initialize the Patient with your HuggingFace model
model = Patient(
pretrained_model_name_or_path="HUGGINGFACE_OR_LOCAL_MODEL_PATH",
config=PatientConfig(batch_size=1, device="cuda"),
)
# 2. Define the concept datasets you want to contrast.
# These can be strings, chat dictionaries, or lists of dictionaries.
calm_contents = ["I feel peaceful and relaxed.", "Taking a deep breath by the ocean.", "Quiet and serene."]
anxious_contents = ["I feel incredibly anxious.", "My heart is racing and I'm stressed.", "Everything is overwhelming."]
# 3. Analyze the concepts to extract activations
calm_reaction = model.analyse(
dataset=calm_contents,
aggregator=mean_aggregator(),
label="calm",
parallel=True,
max_workers=3,
save_checkpoint_every=2,
checkpoint_dir="checkpoint"
)
# Or simply without parallel processing
anxious_reaction = model.analyse(
dataset=anxious_contents,
aggregator=mean_aggregator(),
label="anxious"
)
# 4. Find the neutral middle-ground of the conflicting sentiments
mean_reaction = mean_aggregator()(calm_reaction.activations, anxious_reaction.activations)
# 5. Isolate the "calm" semantic pathway
calm_path = difference_aggregator()(calm_reaction.activations, mean_reaction)
# 6. Normalize the pathway and select the most impactful layers
calm_path = normalize_path(calm_path)
# k=3 selects the top 3 layers. layer_range limits the search to the middle layers
# (15% to 75% depth) to avoid lobotomizing core syntax or final output layers.
calm_path = top_k_layers(calm_path, k=3, layer_range=(0.15, 0.75))
# 7. Steer the model! Create a new context with the steered activations applied.
calm_model = model.ambale(calm_path, safe_scale_activation(factor=3.0))
# Generate text using the newly steered model
output = calm_model.generate("How are you feeling today?", max_new_tokens=50)
print(output)
# 8. Save and Load steered models for future use
calm_model.save("calm_model.lobo")
loaded_calm_model = model.load_ambale("calm_model.lobo")When creating a dataset of conversations, be mindful of list nesting to avoid ambiguity between a "dataset of single messages" and a "single conversation."
If each item in your dataset is an independent conversation, wrap each sequence of dictionaries in an outer list:
dataset = [
[{"role": "user", "content": "Hello!"}],
[{"role": "assistant", "content": "Hi!"}]
]If your dataset contains multiple conversations with multiple turns, it looks like this:
dataset = [
[{"role": "user", "content": "Hello!"}, {"role": "assistant", "content": "How are you?"}],
[{"role": "assistant", "content": "Hi!"}, {"role": "user", "content": "hello"}]
]If the model does not support templated chat, you can use raw strings to define a dataset:
dataset = [
"Hello!",
"How are you?",
"Hi!",
"hello"
]We provide ready-to-use examples in the examples/ directory:
- Tiny Sample: Uses TinyLlama-1.1B-Chat-v1.0 for a quick sentiment steering test.
- Iterative Sample: Demonstrates how to use Iterative Analysis on TinyLlama-1.1B-Chat-v1.0 to steer generation step-by-step.
- Mid Sample: Uses Nanbeige4.1-3B to build a contrastive refusal vector from harmful and harmless instruction datasets.
Here are some of the main sources that inspired the creation of this module:
- Maxime Labonne - Uncensor any LLM with abliteration
- Huggingface (YouTube) - Steering LLM Behavior Without Fine-Tuning
- Welch Labs (YouTube) - The most complex model we actually understand
With iterative_analysis.py, Lobopy now supports capturing and analyzing the model's activations specifically when a certain phrase, token, or concept is generated (rather than just prompted). See the Iterative Sample for a demonstration.
lobopy is licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later).
Copyright (C) 2026 OzelTam
See the LICENSE file for full license text.
