Skip to content

mozhu621/SuperWriter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SuperWriter Logo

📚 SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Language/语言: English | 中文

📄 Paper • 🤗 Model (coming soon)

SuperWriter-agent is an intelligent writing agent framework for long-form text generation, inspired by the human workflow of think first, write later.
Through a three-stage Plan → Write → Refine process and Hierarchical DPO training, the resulting SuperWriter-LM (7 B parameters) matches or surpasses substantially larger models.


🚀 Paper at a Glance

  1. SuperWriter-Agent explicitly injects Thinking and Reflection signals.
  2. Hierarchical DPO + MCTS back-propagates quality from final outputs.
  3. Scores 8.51 on WritingBench, ranking 2 nd overall—only behind DeepSeek-R1 (671 B).

🛠️ Method Overview

SuperWriter Agent Framework

Stage Roles / Sub-steps Goal Key Mechanisms
1️⃣ Plan AI Commentators ↔ Writer
Plan Checker
• Distill topic & structure
• Produce paragraph-level outline
Story-Workshop dialogue • Word-budgeting • Consistency check
2️⃣ Write Thinker → Writer • Draft each section
• Preserve chapter coherence
Thinker Step: bullet ideas & logic • Writer Step: write with previous context
3️⃣ Refine Checker → Editor • Polish draft
• Improve language & logic
Checker: locate weak paragraphs • Editor: targeted rewrite / merge

Hierarchical DPO

Hierarchical DPO

A Monte-Carlo Tree Search builds a three-layer tree (Plan i, Draft j, Refine k).
Leaf scores are discretized to +2 … −2 and averaged bottom-up to create preference pairs, trained with a single DPO loss.


📈 Experimental Results

1. WritingBench Overall

WritingBench Results

SuperWriter-LM tops Academic & Engineering, Finance & Business, Politics & Law, Education domains and is #1 among models of the same size.

2. User Query Win-Rate

Win-Rate Donuts

Scoring: Win = 1, Tie = 0.5, Loss = 0. Eight donut charts (the 8 th is human evaluation) show SuperWriter-LM dominating the 7 B class and remaining competitive against larger models.


🧑‍💻 Code Guide

1. Agent Data Generation

  • Agent/Super_write_agent.py and Agent/Super_write_agent_cn.py generate three-stage (Plan / Write / Refine) SFT data for English and Chinese queries.

2. SFT Post-processing

  • Agent/SFT-Process.py cleans raw agent outputs and produces unified JSONL files.

🔄 Hierarchical DPO Data Construction

  1. Deploy an evaluation service for SFT models (e.g., SGLang or a custom HTTP API).
  2. DPO/MCTS_inference.py explores Plan → Write → Refine combinations via MCTS.
  3. DPO/Step_1_query_evaluation_stand.py creates per-query evaluation rubrics.
  4. DPO/Step_2_LLM_judge.py scores all MCTS leaves through the evaluation service.
  5. DPO/create_dpo_pair.ipynb selects good vs. bad samples to form final DPO pairs.

▶️ Inference Pipeline

Run three successive prompts—Plan, Write, Refine—using the templates in Inference/superwrite_gen.py to obtain the final output.


🏋️‍♂️ Training

Fine-tune with LLaMA-Factory and 360-LLaMA-Factory.
Many thanks to these projects!


🤝 Citation

@misc{wu2025superwriterreflectiondrivenlongformgeneration,
      title={SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models}, 
      author={Yuhao Wu and Yushi Bai and Zhiqiang Hu and Juanzi Li and Roy Ka-Wei Lee},
      year={2025},
      eprint={2506.04180},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.04180}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published