Yulun Jiang*, Liangze Jiang*, Damien Teney, Michael Moor**, Maria Brbić**
Project page | Paper | BibTeX
This repo contains the source code of 🌊LaMer, a Meta-RL framework of training LLM agents to actively explore and adapt to the environment at test time.
To train the LLM Agent with LaMer:
bash examples/minesweeper/lamer_minesweeper_qwen3_4b.sh
To train the LLM Agent with RL baselines:
bash examples/minesweeper/gigpo_minesweeper_qwen3_4b.sh
See the examples folder for more examples.
This work is built upon verl, verl-agent, reflexion, RAGEN. We thank the authors and contributors of these projects for sharing their valuable work.
If you find our code useful, please consider citing:
@article{jiang2025metarl,
title={Meta-RL Induces Exploration in Language Agents},
author={Yulun Jiang and Liangze Jiang and Damien Teney and Michael Moor and Maria Brbic},
journal={arXiv preprint arXiv:2512.16848}
year={2025}
}

