Skip to content

lasgroup/SafetyPolytope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Safety Constraints for LLMs

This repository contains the implementation of the paper Learning Safety Constraints for Large Language Models (ICML2025 Spotlight).

Installation

Prerequisites

  • Conda (Miniconda or Anaconda)
  • Git

Setup Instructions

  1. Clone the repository:
git clone git@github.com:lasgroup/SafetyPolytope.git
cd SafetyPolytope
  1. Create and activate a new conda environment:
conda create -n sap python=3.10 -y
conda activate sap
  1. Install the package in development mode:
pip install -e .

Quick Start

To run the BeaverTails pipeline with default settings:

python src/safety_polytope/polytope/run_beaver_pipeline.py \
    --model_path=Qwen/Qwen2-1.5B-Instruct \
    --mode=local \
    --reduced_data

The --reduced_data flag will run the pipeline with reduced data. Remove this flag if you want to train on the full dataset.

HarmBench Experiments

For instructions on replicating the HarmBench experiments from the paper, please see src/safety_polytope/harmbench/README.md.

License

MIT License.

About

Learning Safety Constraints for Large Language Models (ICML2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published