📄 Paper • 🤗 Model (coming soon)
SuperWriter-agent is an intelligent writing agent framework for long-form text generation, inspired by the human workflow of think first, write later.
Through a three-stage Plan → Write → Refine process and Hierarchical DPO training, the resulting SuperWriter-LM (7 B parameters) matches or surpasses substantially larger models.
- SuperWriter-Agent explicitly injects Thinking and Reflection signals.
- Hierarchical DPO + MCTS back-propagates quality from final outputs.
- Scores 8.51 on WritingBench, ranking 2 nd overall—only behind DeepSeek-R1 (671 B).
| Stage | Roles / Sub-steps | Goal | Key Mechanisms |
|---|---|---|---|
| 1️⃣ Plan | AI Commentators ↔ Writer Plan Checker |
• Distill topic & structure • Produce paragraph-level outline |
Story-Workshop dialogue • Word-budgeting • Consistency check |
| 2️⃣ Write | Thinker → Writer | • Draft each section • Preserve chapter coherence |
Thinker Step: bullet ideas & logic • Writer Step: write with previous context |
| 3️⃣ Refine | Checker → Editor | • Polish draft • Improve language & logic |
Checker: locate weak paragraphs • Editor: targeted rewrite / merge |
A Monte-Carlo Tree Search builds a three-layer tree (Plan i, Draft j, Refine k).
Leaf scores are discretized to +2 … −2 and averaged bottom-up to create preference pairs, trained with a single DPO loss.
SuperWriter-LM tops Academic & Engineering, Finance & Business, Politics & Law, Education domains and is #1 among models of the same size.
Scoring: Win = 1, Tie = 0.5, Loss = 0. Eight donut charts (the 8 th is human evaluation) show SuperWriter-LM dominating the 7 B class and remaining competitive against larger models.
Agent/Super_write_agent.pyandAgent/Super_write_agent_cn.pygenerate three-stage (Plan / Write / Refine) SFT data for English and Chinese queries.
Agent/SFT-Process.pycleans raw agent outputs and produces unified JSONL files.
- Deploy an evaluation service for SFT models (e.g., SGLang or a custom HTTP API).
DPO/MCTS_inference.pyexplores Plan → Write → Refine combinations via MCTS.DPO/Step_1_query_evaluation_stand.pycreates per-query evaluation rubrics.DPO/Step_2_LLM_judge.pyscores all MCTS leaves through the evaluation service.DPO/create_dpo_pair.ipynbselects good vs. bad samples to form final DPO pairs.
Run three successive prompts—Plan, Write, Refine—using the templates in Inference/superwrite_gen.py to obtain the final output.
Fine-tune with LLaMA-Factory and 360-LLaMA-Factory.
Many thanks to these projects!
@misc{wu2025superwriterreflectiondrivenlongformgeneration,
title={SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models},
author={Yuhao Wu and Yushi Bai and Zhiqiang Hu and Juanzi Li and Roy Ka-Wei Lee},
year={2025},
eprint={2506.04180},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.04180},
}
.png)
.png)

