GitHub - Chelsi-create/RLHF_Poison: This project investigates the robustness of Direct Preference Optimization (DPO) in the presence of data poisoning attacks, focusing on how corrupted preference data impacts reward learning and policy behavior. It explores both attack strategies and potential defenses to enhance corruption tolerance in RLHF pipelines.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data_prep		data_prep
evaluation		evaluation
recipes		recipes
rso		rso
scripts		scripts
utils		utils
.gitignore		.gitignore
dpo.py		dpo.py
dpo_datapoint.py		dpo_datapoint.py
dpo_modified_v1.py		dpo_modified_v1.py
dpo_training.py		dpo_training.py
sft.py		sft.py
sim_po.py		sim_po.py
simpo_config.py		simpo_config.py
simpo_trainer.py		simpo_trainer.py

About

This project investigates the robustness of Direct Preference Optimization (DPO) in the presence of data poisoning attacks, focusing on how corrupted preference data impacts reward learning and policy behavior. It explores both attack strategies and potential defenses to enhance corruption tolerance in RLHF pipelines.