if it is possible to conduct RLHF from env

Thanks for open-sourced agentTuning code , I am quite interested in training the model, i see the training framework is not open-sourced https://github.com/THUDM/AgentTuning/issues/1, 

The discussion mentioned that it could support ptuning or LORA, i am also wondering if it could also support RLHF?

Recently, i read a paper: https://arxiv.org/abs/2312.14878,  i am curious how the AgentLM performance would be if we could let it learn from interacting with environments. (refer to Finetune type II in that paper)  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

if it is possible to conduct RLHF from env #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

if it is possible to conduct RLHF from env #51

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions