This is the official repository for our paper "A2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning". This is an adaptive branch of AFM. Different problems are solved using different modes (agentic, reasoning, instant) to ensure that every token consumption is put to the best use. Our work provides an approach that leverages the accuracy rate in inter-gro2up rollouts within Reinforcement Learning (RL), using the difficulty level of problems to train the model's adaptive capability.
A2FM presents a unified framework that bridges the gap between reasoning-centric and agentic LLMs through adaptive mode selection, achieving superior performance while dramatically reducing costs.
🧠 Route-then-Align Principle: A unified framework that bridges reasoning-centric and agentic LLMs through adaptive mode selection, eliminating the inefficiency gap where both families tend to overthink or over-call tools.
⚡ Three-Mode Architecture:
- Instant Mode: Direct reasoning for simple tasks (no tool calls)
- Agentic Mode: Tool-augmented reasoning for complex problems
- Reasoning Mode: Deep chain-of-thought for analytical tasks
🎯 Adaptive Policy Optimization (APO): The key to training efficient models - enforces adaptive sampling across modes with cost-regularized rewards.
Method: APO applies cost-regularized rewards and adaptive sampling to optimize mode selection, ensuring every token consumption delivers maximum value.
Results:
- New SOTA: 13.4% on BrowseComp, 70.4% on AIME25, 16.7% on HLE
- 45.2% cost reduction relative to reasoning models, 33.5% relative to agentic models
- $0.00487 per correct answer - substantially higher cost efficiency while maintaining comparable accuracy
Our tool server infrastructure provides highly stable and fast tool execution capabilities, which are crucial for both RL training and inference. The system features:
- 🔄 Cache Hit Functionality: Intelligent caching mechanism reduces redundant API calls and improves response times
- 🛡️ Error Handling & Retry Mechanisms: Robust error capture and automatic retry logic ensure reliable tool execution
- ⚡ Asynchronous Acceleration: Multi-threaded and async processing for concurrent tool operations
- 🔧 Multi-API Support: Fallback mechanisms across multiple API providers for enhanced reliability
Starting tool servers, refer to ./server/SERVER_README.md:
Available Tool Servers:
- Web Search Server: Multi-API Google search with intelligent caching
- Page Crawler Server: Concurrent page crawling with AI-powered summarization
- Code Executor Server: Secure Python code execution in nsjail sandbox
First, install the required dependencies by executing the command below to install packages listed in requirements.txt:
pip install -r requirements.txtYou can directly download the model by following the links below.
| Model | Download Links | Model Size | Context Length |
|---|---|---|---|
| A2FM-32B-rl | 🤗 HuggingFace | 32B | 128K |
Alternative Download Methods:
- Direct from HuggingFace: Click the 🤗 HuggingFace link above
- Script Download:
cd ./model python download.py
Deploy A2FM using vLLM for high-performance inference:
cd ./deploy
bash ./deploy.sh1. Set Environment Variables
Before running inference, you must set the following required environment variables:
# Model Configuration
export MODEL_NAME="A2FM-32B-rl"
export MODEL_URL="http://localhost:8000/v1"
# OpenAI API Configuration (for judge and summary models)
export OPENAI_API_URL="https://api.openai.com/v1"
export OPENAI_API_KEY="your-openai-api-key-here"
# Tool Server URLs
export WEBSEARCH_URL="http://localhost:9002"
export CRAWL_PAGE_URL="http://localhost:9000"
export CODE_EXEC_URL="http://localhost:9003"2. Run Inference
Prepare your test dataset (refer to /data/example.json format) and run inference. The input is .json/.jsonl file and the output is .jsonl file.
cd ./infer
python infer_main.py --input_file ../data/example.json --output_file ../results/output.jsonlQuick Start with Example Script:
cd ./infer
# Edit example_infer_main.sh to set your actual API keys and URLs
bash example_infer_main.shAdaptive Mode Selection (--adaptive):
auto: Automatic mode selection based on task complexity (recommended)toolcalling_agent: Force agentic mode with tool usage for complex tasksreasoning_agent: Force reasoning mode for analytical tasksinstant: Force instant mode for simple tasks (no tool calls)
Max Steps Configuration (renamed from retry_attempts for clarity):
--max_steps_agent: Maximum execution steps for agentic mode (default: 60)
Example Usage:
cd ./infer
# Auto mode with custom parameters
python infer_main.py \
--input_file ../data/example.json \
--output_file ../results/output.jsonl \
--adaptive auto \
--max_steps_agent 60 \
--temperature 1.0 \
--parallel_per_dataset 5
# Force agentic mode
python infer_main.py \
--input_file ../data/example.json \
--output_file ../results/agentic_output.jsonl \
--adaptive toolcalling_agent \
--max_steps_agent 100
# Force instant mode
python infer_main.py \
--input_file ../data/example.json \
--output_file ../results/instant_output.jsonl \
--adaptive instant Help: Run python infer_main.py --help for complete parameter list.
Listed below are friendly links to relevant agents works from OPPO PersonalAI Lab:
- Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
- Agent Foundation Models: Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
- TaskCraft: Automated Generation of Agentic Tasks
- OAgents: An Empirical Study of Building Effective Agents
- Agent-KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
- MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants
We would like to express our sincere gratitude to the original authors and contributors of LLaMA-Factory and verl, an excellent open-source project that provided a solid foundation for our work. Our implementation has been adapted from the LLaMA-Factory and verl.
If you find A2FM useful in your research or applications, we would appreciate it if you could cite our work:
@article{chen2025textsuperscript,
title={A$\backslash$textsuperscript $\{$2$\}$ FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning},
author={Chen, Qianben and Cao, Jingyi and Zhang, Jiayu and Qin, Tianrui and Li, Xiaowan and Zhu, King and Shi, Dingfeng and Zhu, He and Liu, Minghao and Liang, Xiaobo and others},
journal={arXiv preprint arXiv:2510.12838},
year={2025}
}


