PReF: Language Model Personalization via Reward Factorization

This repository contains the official implementation of the paper Language Model Personalization via Reward Factorization.

Abstract

Modern large language models (LLMs) are optimized for human-aligned responses using Reinforcement Learning from Human Feedback (RLHF). However, existing RLHF approaches assume a universal preference model and fail to account for individual user preferences, limiting their effectiveness in personalized applications. We introduce a framework that extends RLHF to enable user personalization by leveraging the assumption that user preferences lie in a low-dimensional space. Instead of training a separate model per user, we represent user-specific rewards as a linear combination of base reward functions. Using only ~10 user responses, our method can infer user-specific rewards and align LLM outputs accordingly. We validate our approach through experiments with both synthetic and real users, demonstrating significant personalization achieved by our method. In human evaluations, our method achieves a 67% win rate over default GPT-4o responses.

Setup

Create a conda environment:

conda create -n pref python=3.10

Activate the environment:

conda activate pref

Download the data from here and extract it into the data/ directory.

Citation

If you find this work useful, please cite our paper:

@article{pref2024,
title={Language Model Personalization via Reward Factorization},
author={[Authors]},
journal={arXiv preprint arXiv:2503.06358},
year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
PReF_code		PReF_code
configs		configs
data		data
README.md		README.md
discover_user.py		discover_user.py
environment.yml		environment.yml
setup.py		setup.py
train_features.py		train_features.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PReF: Language Model Personalization via Reward Factorization

Abstract

Setup

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

idanshen/PReF_code

Folders and files

Latest commit

History

Repository files navigation

PReF: Language Model Personalization via Reward Factorization

Abstract

Setup

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages