Overview 🎯

A²FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

This is the official repository for our paper "A²FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning". This is an adaptive branch of AFM. Different problems are solved using different modes (agentic, reasoning, instant) to ensure that every token consumption is put to the best use. Our work provides an approach that leverages the accuracy rate in inter-gro2up rollouts within Reinforcement Learning (RL), using the difficulty level of problems to train the model's adaptive capability.

Overview 🎯

A²FM presents a unified framework that bridges the gap between reasoning-centric and agentic LLMs through adaptive mode selection, achieving superior performance while dramatically reducing costs.

Key Innovations & Highlights

🧠 Route-then-Align Principle: A unified framework that bridges reasoning-centric and agentic LLMs through adaptive mode selection, eliminating the inefficiency gap where both families tend to overthink or over-call tools.

⚡ Three-Mode Architecture:

Instant Mode: Direct reasoning for simple tasks (no tool calls)
Agentic Mode: Tool-augmented reasoning for complex problems
Reasoning Mode: Deep chain-of-thought for analytical tasks

🎯 Adaptive Policy Optimization (APO): The key to training efficient models - enforces adaptive sampling across modes with cost-regularized rewards.

Training Efficient Models with APO

Method: APO applies cost-regularized rewards and adaptive sampling to optimize mode selection, ensuring every token consumption delivers maximum value.

Results:

New SOTA: 13.4% on BrowseComp, 70.4% on AIME25, 16.7% on HLE
45.2% cost reduction relative to reasoning models, 33.5% relative to agentic models
$0.00487 per correct answer - substantially higher cost efficiency while maintaining comparable accuracy

Quick Start

1. Tool Server Deployment

Our tool server infrastructure provides highly stable and fast tool execution capabilities, which are crucial for both RL training and inference. The system features:

🔄 Cache Hit Functionality: Intelligent caching mechanism reduces redundant API calls and improves response times
🛡️ Error Handling & Retry Mechanisms: Robust error capture and automatic retry logic ensure reliable tool execution
⚡ Asynchronous Acceleration: Multi-threaded and async processing for concurrent tool operations
🔧 Multi-API Support: Fallback mechanisms across multiple API providers for enhanced reliability

Starting tool servers, refer to ./server/SERVER_README.md:

Available Tool Servers:

Web Search Server: Multi-API Google search with intelligent caching
Page Crawler Server: Concurrent page crawling with AI-powered summarization
Code Executor Server: Secure Python code execution in nsjail sandbox

2. Model Download & Inference

Install Dependencies

First, install the required dependencies by executing the command below to install packages listed in requirements.txt:

pip install -r requirements.txt

Model Download

You can directly download the model by following the links below.

Model	Download Links	Model Size	Context Length
A²FM-32B-rl	🤗 HuggingFace	32B	128K

Alternative Download Methods:

Direct from HuggingFace: Click the 🤗 HuggingFace link above
Script Download:
```
cd ./model
python download.py
```

Deploy Model Server

Deploy A²FM using vLLM for high-performance inference:

cd ./deploy
bash ./deploy.sh

Run Inference

1. Set Environment Variables

Before running inference, you must set the following required environment variables:

# Model Configuration
export MODEL_NAME="A2FM-32B-rl"
export MODEL_URL="http://localhost:8000/v1"

# OpenAI API Configuration (for judge and summary models)
export OPENAI_API_URL="https://api.openai.com/v1"
export OPENAI_API_KEY="your-openai-api-key-here"

# Tool Server URLs
export WEBSEARCH_URL="http://localhost:9002"
export CRAWL_PAGE_URL="http://localhost:9000"
export CODE_EXEC_URL="http://localhost:9003"

2. Run Inference

Prepare your test dataset (refer to /data/example.json format) and run inference. The input is .json/.jsonl file and the output is .jsonl file.

cd ./infer
python infer_main.py --input_file ../data/example.json --output_file ../results/output.jsonl

Quick Start with Example Script:

cd ./infer
# Edit example_infer_main.sh to set your actual API keys and URLs
bash example_infer_main.sh

Key Parameters

Adaptive Mode Selection (--adaptive):

auto: Automatic mode selection based on task complexity (recommended)
toolcalling_agent: Force agentic mode with tool usage for complex tasks
reasoning_agent: Force reasoning mode for analytical tasks
instant: Force instant mode for simple tasks (no tool calls)

Max Steps Configuration (renamed from retry_attempts for clarity):

--max_steps_agent: Maximum execution steps for agentic mode (default: 60)

Example Usage:

cd ./infer
# Auto mode with custom parameters
python infer_main.py \
    --input_file ../data/example.json \
    --output_file ../results/output.jsonl \
    --adaptive auto \
    --max_steps_agent 60 \
    --temperature 1.0 \
    --parallel_per_dataset 5

# Force agentic mode
python infer_main.py \
    --input_file ../data/example.json \
    --output_file ../results/agentic_output.jsonl \
    --adaptive toolcalling_agent \
    --max_steps_agent 100

# Force instant mode
python infer_main.py \
    --input_file ../data/example.json \
    --output_file ../results/instant_output.jsonl \
    --adaptive instant

Help: Run python infer_main.py --help for complete parameter list.

Related Work

Listed below are friendly links to relevant agents works from OPPO PersonalAI Lab:

Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
Agent Foundation Models: Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
TaskCraft: Automated Generation of Agentic Tasks
OAgents: An Empirical Study of Building Effective Agents
Agent-KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants

Acknowledgement

We would like to express our sincere gratitude to the original authors and contributors of LLaMA-Factory and verl, an excellent open-source project that provided a solid foundation for our work. Our implementation has been adapted from the LLaMA-Factory and verl.

Citation

If you find A²FM useful in your research or applications, we would appreciate it if you could cite our work:

@article{chen2025textsuperscript,
  title={A$\backslash$textsuperscript $\{$2$\}$ FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning},
  author={Chen, Qianben and Cao, Jingyi and Zhang, Jiayu and Qin, Tianrui and Li, Xiaowan and Zhu, King and Shi, Dingfeng and Zhu, He and Liu, Minghao and Liang, Xiaobo and others},
  journal={arXiv preprint arXiv:2510.12838},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A²FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

Overview 🎯

Key Innovations & Highlights

Training Efficient Models with APO

Table of Contents

Quick Start

1. Tool Server Deployment

2. Model Download & Inference

Install Dependencies

Model Download

Deploy Model Server

Run Inference

Key Parameters

Related Work

Acknowledgement

Citation

Star

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
data		data
deploy		deploy
infer		infer
model		model
server		server
.DS_Store		.DS_Store
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
requirements.txt		requirements.txt

License

OPPO-PersonalAI/Adaptive_Agent_Foundation_Models

Folders and files

Latest commit

History

Repository files navigation

A2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

Overview 🎯

Key Innovations & Highlights

Training Efficient Models with APO

Table of Contents

Quick Start

1. Tool Server Deployment

2. Model Download & Inference

Install Dependencies

Model Download

Deploy Model Server

Run Inference

Key Parameters

Related Work

Acknowledgement

Citation

Star

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

A²FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

Packages