📚 SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Language/语言: English | 中文

SuperWriter-agent is an intelligent writing agent framework for long-form text generation, inspired by the human workflow of think first, write later.
Through a three-stage Plan → Write → Refine process and Hierarchical DPO training, the resulting SuperWriter-LM (7 B parameters) matches or surpasses substantially larger models.

🚀 Paper at a Glance

SuperWriter-Agent explicitly injects Thinking and Reflection signals.
Hierarchical DPO + MCTS back-propagates quality from final outputs.
Scores 8.51 on WritingBench, ranking 2 nd overall—only behind DeepSeek-R1 (671 B).

🛠️ Method Overview

Stage	Roles / Sub-steps	Goal	Key Mechanisms
1️⃣ Plan	AI Commentators ↔ Writer Plan Checker	• Distill topic & structure • Produce paragraph-level outline	Story-Workshop dialogue • Word-budgeting • Consistency check
2️⃣ Write	Thinker → Writer	• Draft each section • Preserve chapter coherence	Thinker Step: bullet ideas & logic • Writer Step: write with previous context
3️⃣ Refine	Checker → Editor	• Polish draft • Improve language & logic	Checker: locate weak paragraphs • Editor: targeted rewrite / merge

Hierarchical DPO

A Monte-Carlo Tree Search builds a three-layer tree (Plan i, Draft j, Refine k).
Leaf scores are discretized to +2 … −2 and averaged bottom-up to create preference pairs, trained with a single DPO loss.

📈 Experimental Results

1. WritingBench Overall

SuperWriter-LM tops Academic & Engineering, Finance & Business, Politics & Law, Education domains and is #1 among models of the same size.

2. User Query Win-Rate

Scoring: Win = 1, Tie = 0.5, Loss = 0. Eight donut charts (the 8 th is human evaluation) show SuperWriter-LM dominating the 7 B class and remaining competitive against larger models.

🧑‍💻 Code Guide

1. Agent Data Generation

Agent/Super_write_agent.py and Agent/Super_write_agent_cn.py generate three-stage (Plan / Write / Refine) SFT data for English and Chinese queries.

2. SFT Post-processing

Agent/SFT-Process.py cleans raw agent outputs and produces unified JSONL files.

🔄 Hierarchical DPO Data Construction

Deploy an evaluation service for SFT models (e.g., SGLang or a custom HTTP API).
DPO/MCTS_inference.py explores Plan → Write → Refine combinations via MCTS.
DPO/Step_1_query_evaluation_stand.py creates per-query evaluation rubrics.
DPO/Step_2_LLM_judge.py scores all MCTS leaves through the evaluation service.
DPO/create_dpo_pair.ipynb selects good vs. bad samples to form final DPO pairs.

▶️ Inference Pipeline

Run three successive prompts—Plan, Write, Refine—using the templates in Inference/superwrite_gen.py to obtain the final output.

🏋️‍♂️ Training

Fine-tune with LLaMA-Factory and 360-LLaMA-Factory.
Many thanks to these projects!

🤝 Citation

@misc{wu2025superwriterreflectiondrivenlongformgeneration,
      title={SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models}, 
      author={Yuhao Wu and Yushi Bai and Zhiqiang Hu and Juanzi Li and Roy Ka-Wei Lee},
      year={2025},
      eprint={2506.04180},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.04180}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Agent		Agent
DPO		DPO
Fig		Fig
Inference		Inference
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Language/语言: English | 中文

🚀 Paper at a Glance

🛠️ Method Overview

Hierarchical DPO

📈 Experimental Results

1. WritingBench Overall

2. User Query Win-Rate

🧑‍💻 Code Guide

1. Agent Data Generation

2. SFT Post-processing

🔄 Hierarchical DPO Data Construction

▶️ Inference Pipeline

🏋️‍♂️ Training

🤝 Citation

About

Uh oh!

Releases

Packages

Languages

mozhu621/SuperWriter

Folders and files

Latest commit

History

Repository files navigation

📚 SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Language/语言: English | 中文

🚀 Paper at a Glance

🛠️ Method Overview

Hierarchical DPO

📈 Experimental Results

1. WritingBench Overall

2. User Query Win-Rate

🧑‍💻 Code Guide

1. Agent Data Generation

2. SFT Post-processing

🔄 Hierarchical DPO Data Construction

▶️ Inference Pipeline

🏋️‍♂️ Training

🤝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages