CREST is a training-free test-time steering framework that discovers cognitive heads via simple offline calibration and then rotates activations during decoding to guide the model’s reasoning—preserving norms to avoid per-model hyperparameter tuning. This improves accuracy and reduces tokens across models and datasets.
CREST (Cognitive REasoning Steering at Test-time) identifies attention heads whose activations are predictive of different reasoning modes (“cognitive heads”), then steers those heads at inference to suppress inefficient trajectories and encourage effective reasoning—without further training.
- Token savings with accuracy gains. e.g., R1-7B on MATH500: 92.4% with 34% fewer tokens; R1-1.5B on AMC23: 37.6% token reduction at higher accuracy.
- Generalizes across models/datasets (DeepSeek-R1 1.5B/7B/32B, Qwen3-4B/30B, GPT-OSS-20B; MATH500, AIME, AMC23, GPQA-D, LiveCodeBench, Calendar Planning).
- Head ratio “gold default.” Steering about the top ~38% heads (ranked by linear-probe accuracy) balances accuracy and token reduction; adopted as the default.
Recommend use vllm docker:
vllm/vllm-openai@sha256:d731ee65c044ae0977421eed3d93f931d4b7d79614394184c939db35b8f28fc2 & docker_info.txt
inside the docker:
cd CREST/probing/omni_math_rule/evaluation
pip install evaluate
cd latex2sympy
pip install -e .
cd ..
pip install -r requirements.txt
cd ../../
pip install lighteval
pip install datasets==3.5.0
pip install emoji
otherwise:
cd CREST
pip install requirements.txt
cd probing/omni_math_rule/evaluation
pip install evaluate
cd latex2sympy
pip install -e .
cd ..
pip install -r requirements.txt
cd ../../
pip install lighteval
pip install datasets==3.5.0
pip install emoji# Run baseline without steering
bash script/baseline.shThis script:
- Sets
STEERING=False - Evaluates models across multiple datasets
- Saves results in
results/SteeringFalse/directory
# Run with steering enabled
bash script/ours.shThis script:
- Sets
STEERING=True - Configures steering parameters:
steering_number=512: Number of top steering vectors to usesteering_coef=-4: Steering coefficient (negative for inhibition)steering_mode=after_o_proj_norm_threshold: Steering application mode
- Saves results in
results/SteeringTrue_numb512_coef-4_mode[mode]/directory - NOW STEERING MUST SET
--enforce_eager, CUDA GRAPH AND TORCH COMPILE ARE NOT SUPPORTED YET
The Steering Vector Zoo contains pre-trained steering vectors that can be applied to modify model behavior during inference. These vectors are learned through probing techniques and stored as PyTorch (.pt) files.
probing/results/
├── [DATASET]/ # Training dataset (e.g., MATH_train)
│ └── [MODEL]/ # Model name (e.g., Qwen3-30B-A3B-Thinking-2507)
│ └── template-t0-n1-[SIZE]/ # Template and size configuration
│ └── hidden[MODE]/ # Steering mode directory
| └── mix_others_low_rank_1000/ # Training methods
│ └── probe_best.pt # Steering vector file
The system automatically selects the top-k most accurate steering vectors based on:
- Probe Accuracy: Classification performance on the validation set
- Layer-Head Coverage: Distribution across different attention layers and heads
- Steering Coefficient: Magnitude of influence (configurable via
--steering_coef)
python main_vllm.py \
--model_name_or_path "Qwen/Qwen3-30B-A3B-Thinking-2507" \
--dataset "aime25" \
--save_dir "results/test/" \
--use_chat_format \
--temperature 0.6 \
--max_tokens 32768 \
--steering True \
--steering_vector_path "/path/to/steering/vectors/" \
--steering_number 512 \
--steering_coef -4 \
--steering_mode "after_o_proj_norm_threshold"export STEERING=True
export STEERING_VECTOR_PATH="/path/to/steering/vectors/"
export STEERING_NUMBER=512
export STEERING_COEF=-4
export STEERING_MODE="after_o_proj_norm_threshold"
export MODEL_NAME_OR_PATH="Qwen/Qwen3-30B-A3B-Thinking-2507"The implementation consists of three main components:
- Purpose: Core inference script supporting multiple datasets and models
- Features:
- Support for multiple datasets (MATH, GSM, AIME, GPQA, LiveCodeBench, etc.)
- Batch processing with vLLM backend
- Configurable steering parameters
- Multi-model support with automatic model detection
Contains model-specific monkey patches for different architectures:
qwen2/monkey_patch.py: DeepSeek-R1-Distill-Qwen modelsqwen3/monkey_patch.py: Qwen3-4B-Thinking modelsqwen3_moe/monkey_patch.py: Qwen3-30B-A3B-Thinking modelsgpt_oss/monkey_patch.py: GPT-OSS models
Each monkey patch implements:
- Attention mechanism modifications
- Layer-wise steering vector application
- Dynamic steering flag management
- Multiple steering modes
- Purpose: Mathematical reasoning evaluation using OmniMath rules
- Features:
- Parallel evaluation with timeout handling
- Multiple prediction aggregation
- Accuracy computation and result saving
The system supports four distinct steering modes, each applying steering vectors at different points in the attention mechanism:
- Application Point: Before the output projection in attention
- Mechanism: Modifies attention output before linear transformation
- Use Case: Early intervention in attention computation
- Application Point: After the output projection in attention
- Mechanism: Direct modification of projected attention output
- Use Case: Standard steering application
- Application Point: After output projection with norm preservation
- Mechanism: Applies steering while preserving vector norms
- Use Case: Maintaining activation magnitude consistency
- Application Point: After output projection with thresholded norm preservation
- Mechanism: Applies steering with awareness thresholding and norm preservation
- Use Case: Selective steering based on activation patterns
- "aime25"
- "AIME"
- "amc23"
- "cp"
- "lcb"
- "MATH500"
- "GSM"
- "gpqa"
The system automatically detects and applies appropriate steering based on model names:
- DeepSeek-R1-Distill-Qwen: Uses
qwen2monkey patch - Qwen3-4B: Uses
qwen3monkey patch - Qwen3-30B: Uses
qwen3_moemonkey patch - gpt-oss: Uses
gpt_ossmonkey patch
Steering vectors should be saved as PyTorch files with the following structure:
{
layer_idx: {
head_idx: {
'accuracy': float, # Probe accuracy
'model_dict': {
'weight': torch.Tensor # Hyperplane weights
},
'steering_vector': torch.Tensor # Steering direction
}
}
}Results are saved in the following structure:
results/
├── SteeringFalse/ # Baseline results
│ └── [model]/[dataset]/seed[X]/
├── SteeringTrue_numb[N]_coef[C]_mode[M]/ # Steering results
│ └── [model]/[dataset]/
└── predictions.jsonl # Detailed predictions
└── metrics.json # Evaluation metrics
- Dynamic Steering: Steering is applied based on token patterns (e.g., double newlines)
- Multi-GPU Support: Tensor parallelism with proper rank handling
- Batch Processing: Efficient batch inference with vLLM
- Comprehensive Evaluation: Multiple evaluation metrics per dataset
- Flexible Configuration: Easy parameter tuning through environment variables
- Vector Loading: Top-k steering vectors are loaded based on probe accuracy
- Flag Detection: Steering flags are set based on specific token patterns
- Awareness Computation: Hyperplane-based awareness scores determine steering strength
- Vector Application: Steering vectors are applied with configurable coefficients
- Norm Preservation: Optional norm preservation maintains activation magnitudes
The system uses runtime monkey patching to modify:
forward()methods in attention layersforward()methods in decoder layersforward()methods in the main model- Token-based steering flag management
- Memory: Steering vectors are loaded onto GPU memory
- Scalability: Supports tensor parallelism for large models
- Efficiency: Conditional application reduces unnecessary computation
This implementation supports research in:
- Mechanistic interpretability of reasoning
- Activation steering and control
- Cognitive enhancement in language models
- Mathematical reasoning improvement
- Code generation optimization
If you use this code or ideas, please cite the CREST paper:
Understanding and Steering The Cognitive Behaviors of Reasoning Models At Test-Time (CREST)