If I want to use an LLM to obtain a reward score for a response, how should I specify this in the custom-rm-path?

If you want to use an LLM to produce a reward score (i.e., the RM is “LLM-as-a-judge”), the recommended approach is to run that LLM as an isolated service and put a thin HTTP server in front of it that exposes a reward/scoring endpoint.

Then, in slime, you can treat it as a remote reward model:

--rm-type remote_rm

You can refer to the following code for the endpoint design of the server:

slime/slime/rollout/rm_hub/__init__.py

Lines 17 to 27 in 06005f3

    
           async def remote_rm(args, sample: Sample): 
        
               payload = { 
        
                   "prompt": sample.prompt, 
        
                   "response": sample.response, 
        
                   "label": sample.label, 
        
               } 
        
               session_kwargs = {} 
        
               async with aiohttp.ClientSession(**session_kwargs) as session: 
        
                   async with session.post(args.rm_url, json=payload) as resp: 
        
                       resp.raise_for_status() 
        
                       return await resp.json()

	async def remote_rm(args, sample: Sample):
	payload = {
	"prompt": sample.prompt,
	"response": sample.response,
	"label": sample.label,
	}
	session_kwargs = {}
	async with aiohttp.ClientSession(**session_kwargs) as session:
	async with session.post(args.rm_url, json=payload) as resp:
	resp.raise_for_status()
	return await resp.json()

Use LLM-based reward scores in custom-rm-path. #1308

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions