Skip to content

RhizoNymph/quTrainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

quTrainer

This is an experiment in increasing the query understanding of LLMs for better formed queries in agentic search.

It begins with a reward model, trained on Google's query-wellformedness dataset which consists of annotated data from the Paralex corpus scraped from WikiAnswers.

https://github.com/google-research-datasets/query-wellformedness https://knowitall.cs.washington.edu/paralex/

This reward model is a fine tune of ModernBERT from Answer.AI.

https://github.com/AnswerDotAI/ModernBERT

It uses pytorch for training and optuna to do a bayesian hyperparameter optimization run.

The goal is to use this reward model in RL based post training and measure performance on an agentic search benchmark before and after. I haven't yet decided what benchmark(s) I will use but I'll update when I do.

Currently the only thing to do here is run:

uv run reward_model.py

This will start the optuna study and train the reward model, saving an optimization history and the best model. With default settings, this takes a few hours on a 3090.

About

This is an experiment in increasing the query understanding of LLMs for better formed queries in agentic search.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors