Inspired by how the brain learns. Meta-learn and evolve your 🦞 from every conversation in the wild. No GPU required. Works with Kimi, Qwen, Claude, MiniMax, and more.
|
Kimi |
Qwen |
Claude |
MiniMax |
OpenAI |
Gemini |
+ Much More |
🇨🇳 中文 • 🇯🇵 日本語 • 🇰🇷 한국어 • 🇫🇷 Français • 🇩🇪 Deutsch • 🇪🇸 Español
metaclaw setup # one-time config wizard
metaclaw start # default: madmax mode — skills + scheduled RL training
metaclaw start --mode rl # RL without scheduler (trains immediately on full batch)
metaclaw start --mode skills_only # skills only, no RL (no Tinker needed)- [03/13/2026] v0.3 — Continual meta-learning support: slow RL updates now only run during sleep hours, idle time, or Google Calendar meetings. Added support/query set separation to prevent stale reward signals from polluting model updates.
- [03/11/2026] v0.2 — One-click deployment via
metaclawCLI. Skills enabled by default, RL is now opt-in. - [03/09/2026] We release MetaClaw — Just talk to your agent and let it evolve automatically. NO GPU deployment required; just plug into the API.
video_v2_compressed.mp4
MetaClaw is an agent that meta-learns and evolves in the wild. Just talk to your agent as you normally would — MetaClaw turns every live conversation into a learning signal, enabling the agent to continuously improve through real-world deployment rather than offline training alone.
Under the hood, it places your model behind an OpenAI-compatible proxy that intercepts interactions from OpenClaw, injects relevant skills at each turn, and meta-learns from accumulated experience. Skills are summarized automatically after each session; with RL enabled, a meta-learning scheduler defers weight updates to idle windows so the agent is never interrupted during active use.
No GPU cluster required. MetaClaw works with any OpenAI-compatible LLM API out of the box, and optionally integrates Kimi-K2.5 (1T MoE) via Tinker for cloud-based LoRA training.
Configure once with metaclaw setup, then metaclaw start brings up the proxy, injects skills, and wires OpenClaw automatically. No manual shell scripts needed.
| Mode | Default | What it does |
|---|---|---|
madmax |
✅ | RL + smart scheduler. Skills always on; RL weight updates only run during sleep/idle/meeting windows. |
rl |
— | RL without scheduler. Trains immediately when a batch is full (original v0.2 behavior). |
skills_only |
— | Proxy → your LLM API. Skills injected, auto-summarized after each session. No GPU/Tinker required. |
At every turn, MetaClaw retrieves the most relevant skill instructions and injects them into the agent's system prompt. Immediate behavior improvement without retraining.
After each conversation, the same LLM you're already using analyzes the session and distills new skills automatically. With RL enabled, a dedicated judge model extracts skills from failed episodes.
In skills_only mode, only a network connection is needed. RL training is offloaded to Tinker cloud.
MetaClaw supports both:
- RL (GRPO) for learning from implicit feedback signals
- On-Policy Distillation (OPD) for distilling a larger teacher model into the student on-policy
In OPD mode, the student generates responses as usual, and a teacher model provides per-token log-probabilities on those same responses. The teacher logprobs are passed to the loss function (e.g., cispo) so the student learns to match the teacher's distribution. The teacher must be served behind an OpenAI-compatible /v1/completions endpoint (e.g., vLLM, SGLang).
Serving, reward modeling, and training are fully decoupled. The agent continues responding while scoring and optimization run in parallel.
pip install -e . # skills_only mode (lightweight)
pip install -e ".[rl]" # + RL training support (torch, transformers, tinker)
pip install -e ".[evolve]" # + skill evolution via OpenAI-compatible LLM
pip install -e ".[scheduler]" # + Google Calendar integration for scheduler
pip install -e ".[rl,evolve,scheduler]" # recommended for full RL + scheduler setupmetaclaw setupThe interactive wizard will ask you to choose your LLM provider (Kimi, Qwen, MiniMax, or custom), enter your API key, and optionally enable RL training.
metaclaw startThat's it. MetaClaw starts the proxy, automatically configures OpenClaw to use it, and restarts the gateway. Open OpenClaw and start chatting — skills are injected at every turn, and the session is automatically summarized into new skills when you're done.
metaclaw setup # Interactive first-time configuration wizard
metaclaw start # Start MetaClaw (default: madmax mode)
metaclaw start --mode rl # Force RL mode (no scheduler) for this session
metaclaw start --mode skills_only # Force skills-only mode for this session
metaclaw stop # Stop a running MetaClaw instance
metaclaw status # Check proxy health, running mode, and scheduler state
metaclaw config show # View current configuration
metaclaw config KEY VALUE # Set a config value
Common config keys:
metaclaw config rl.enabled true # Enable RL training
metaclaw config rl.tinker_api_key sk-... # Set Tinker key
metaclaw config skills.auto_evolve false # Disable auto skill summarization
metaclaw config proxy.port 31000 # Change proxy portConfiguration lives in ~/.metaclaw/config.yaml, created by metaclaw setup.
mode: madmax # "madmax" | "rl" | "skills_only"
llm:
provider: kimi # kimi | qwen | openai | minimax | custom
model_id: moonshotai/Kimi-K2.5
api_base: https://api.moonshot.cn/v1
api_key: sk-...
proxy:
port: 30000
skills:
enabled: true
dir: ~/.metaclaw/skills # your skill library
retrieval_mode: template # template | embedding
top_k: 6
task_specific_top_k: 10 # cap task-specific skills (default 10)
auto_evolve: true # auto-summarize skills after each session
rl:
enabled: false # set to true to enable RL training
model: moonshotai/Kimi-K2.5
tinker_api_key: ""
prm_url: https://api.openai.com/v1
prm_model: gpt-5.2
prm_api_key: ""
lora_rank: 32
batch_size: 4
resume_from_ckpt: "" # optional checkpoint path to resume training
evolver_api_base: "" # leave empty to reuse llm.api_base
evolver_api_key: ""
evolver_model: gpt-5.2
opd:
enabled: false # set to true to enable OPD (teacher distillation)
teacher_url: "" # teacher model base URL (OpenAI-compatible /v1/completions)
teacher_model: "" # teacher model name (e.g., Qwen/Qwen3-32B)
teacher_api_key: "" # teacher model API key
kl_penalty_coef: 1.0 # KL penalty coefficient for OPD
max_context_tokens: 20000 # prompt token cap before truncation
scheduler: # v0.3: meta-learning scheduler (auto-enabled in madmax mode)
enabled: false # madmax mode enables this automatically; set manually for rl mode
sleep_start: "23:00" # HH:MM local time — start of sleep window
sleep_end: "07:00" # HH:MM local time — end of sleep window
idle_threshold_minutes: 30 # trigger RL after N minutes of keyboard inactivity
min_window_minutes: 15 # minimum window length required to start an RL step
calendar:
enabled: false # use Google Calendar to detect meeting slots
credentials_path: "" # path to client_secrets.json from Google Cloud Console
token_path: "" # saved OAuth token (default: ~/.metaclaw/calendar_token.json)Skills are short Markdown instructions injected into the agent's system prompt at each turn. They live in your skills directory (~/.metaclaw/skills/ by default), organized as individual SKILL.md files.
Skill auto-summarization runs after each conversation. The LLM you configured analyzes what happened and generates new skills automatically. No manual curation needed — the library grows with your usage.
To pre-load the built-in skill bank (40+ skills across coding, security, agentic tasks, etc.):
cp -r memory_data/skills/* ~/.metaclaw/skills/Enable RL training to continuously fine-tune the model from live conversations:
metaclaw config rl.enabled true
metaclaw config rl.tinker_api_key sk-...
metaclaw config rl.prm_url https://api.openai.com/v1
metaclaw config rl.prm_api_key sk-...
metaclaw startIn RL mode:
- Each conversation turn is tokenized and submitted as a training sample
- A judge LLM (PRM) scores responses asynchronously
- Tinker cloud runs LoRA fine-tuning; updated weights are hot-swapped every
batch_sizesamples - A dedicated evolver LLM extracts new skills from failed episodes
Programmatic rollout (no OpenClaw TUI needed): set openclaw_env_data_dir to a directory of JSONL task files:
{"task_id": "task_1", "instruction": "Register the webhook at https://example.com/hook"}On-Policy Distillation (OPD) lets you distill a larger teacher model into the student while it trains on-policy. The student generates responses as usual; the teacher provides per-token log-probabilities on those same responses. A KL penalty steers the student toward the teacher's distribution.
metaclaw config opd.enabled true
metaclaw config opd.teacher_url http://localhost:8082/v1
metaclaw config opd.teacher_model Qwen/Qwen3-32B
metaclaw config opd.kl_penalty_coef 1.0
metaclaw start --mode rlThe teacher must be served behind an OpenAI-compatible /v1/completions endpoint (e.g., vLLM, SGLang). OPD can be combined with PRM scoring — both run asynchronously.
See examples/run_conversation_opd.py for a programmatic example and scripts/run_openclaw_tinker_opd.sh for a ready-made launch script.
In RL mode, the weight hot-swap step pauses the agent for several minutes. The scheduler (enabled by default in madmax mode) defers RL updates to user-inactive windows so the agent is never interrupted during active use.
metaclaw config scheduler.sleep_start "23:00"
metaclaw config scheduler.sleep_end "07:00"
metaclaw config scheduler.idle_threshold_minutes 30
# Optional: Google Calendar integration
pip install -e ".[scheduler]"
metaclaw config scheduler.calendar.enabled true
metaclaw config scheduler.calendar.credentials_path ~/.metaclaw/client_secrets.jsonThree conditions trigger an update window (any one is sufficient): configured sleep hours, system keyboard inactivity, or an active Google Calendar event. If the user returns mid-update, the partial batch is saved and resumed at the next window.
Each ConversationSample is tagged with a skill_generation version. When skill evolution bumps the generation, the RL buffer is flushed so only post-evolution samples are used for gradient updates (MAML support/query set separation).
@misc{xia2026metaclaw,
author = {Xia, Peng and Chen, Jianwen and Yang, Xinyu and Tu, Haoqin and Han, Siwei and Qiu, Shi and Zheng, Zeyu and Xie, Cihang and Yao, Huaxiu},
title = {MetaClaw: Just Talk --- An Agent That Meta-Learns and Evolves in the Wild},
year = {2026},
organization = {GitHub},
url = {https://github.com/aiming-lab/MetaClaw},
}MetaClaw builds on top of the following open-source projects:
- OpenClaw – the core agent framework.
- SkillRL – our skill-augmented RL framework.
- Tinker – used for online RL training.
- OpenClaw-RL – inspiration for our RL design.
- awesome-openclaw-skills – provides the foundation for our skill bank.
This project is licensed under the MIT License.
