Add complete TOC/chapter extraction and XTC concatenation support by chazeon · Pull Request #1 · chazeon/xtctool

chazeon · 2025-12-13T05:21:47Z

This commit implements a full pipeline for extracting table of contents from Typst, Markdown, and PDF files, converting them to XTC chapters, and preserving chapters when concatenating XTC files.

Changes

Pipeline Refactoring (Typst/Markdown → PDF → Images)

Switch TypstFileAsset to compile to PDF instead of PNG
Switch MarkdownFileAsset to compile to PDF instead of PNG
Both now delegate to PDFAsset for rendering and TOC extraction
Enables consistent TOC extraction across all document types

Chapter Extraction and Propagation

Extract chapters from XTC files when loading (XTContainerAsset)
Attach chapter metadata to frames for downstream processing
Update extract_chapters_from_toc() to handle both:
- TOC metadata (from PDF/Typst/Markdown headings)
- Chapter metadata (from existing XTC files)
Automatically adjust page numbers when concatenating

XTC Format Fix

Ensure metadata section is written when chapters are present
Chapter count is stored in metadata, so metadata must exist

Tests

Add tests/test_toc_to_xtc.py: End-to-end TOC → XTC pipeline tests
Add tests/test_xtc_concat_chapters.py: Chapter preservation tests
All 79 tests pass

Features

✅ TOC extraction from Typst/Markdown/PDF to XTC chapters ✅ Chapter preservation when concatenating XTC files ✅ Page number adjustment for concatenated chapters ✅ Mixed source support (XTC + PDF/Typst/Markdown)
✅ Page subsetting still works with new PDF pipeline

🤖 Generated with Claude Code

This commit implements a full pipeline for extracting table of contents from Typst, Markdown, and PDF files, converting them to XTC chapters, and preserving chapters when concatenating XTC files. ## Changes ### Pipeline Refactoring (Typst/Markdown → PDF → Images) - Switch TypstFileAsset to compile to PDF instead of PNG - Switch MarkdownFileAsset to compile to PDF instead of PNG - Both now delegate to PDFAsset for rendering and TOC extraction - Enables consistent TOC extraction across all document types ### Chapter Extraction and Propagation - Extract chapters from XTC files when loading (XTContainerAsset) - Attach chapter metadata to frames for downstream processing - Update extract_chapters_from_toc() to handle both: - TOC metadata (from PDF/Typst/Markdown headings) - Chapter metadata (from existing XTC files) - Automatically adjust page numbers when concatenating ### XTC Format Fix - Ensure metadata section is written when chapters are present - Chapter count is stored in metadata, so metadata must exist ### Tests - Add tests/test_toc_to_xtc.py: End-to-end TOC → XTC pipeline tests - Add tests/test_xtc_concat_chapters.py: Chapter preservation tests - All 79 tests pass ## Features ✅ TOC extraction from Typst/Markdown/PDF to XTC chapters ✅ Chapter preservation when concatenating XTC files ✅ Page number adjustment for concatenated chapters ✅ Mixed source support (XTC + PDF/Typst/Markdown) ✅ Page subsetting still works with new PDF pipeline 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- TypstFileAsset now returns PDFAsset instead of list[ImageAsset] - MarkdownFileAsset now returns PDFAsset instead of list[ImageAsset] - Assets are now truly atomic - each converts to exactly one next stage - CLI stack automatically chains conversions for complete pipeline flow - Enables future parallelization of asset conversions Fix XTC chapter reading bug - XTCReader._read_chapters was called with has_chapters (0/1) instead of chapter_count - Now correctly reads chapter_count from metadata section - Fixes bug where only 1 chapter was read regardless of actual count Update tests for atomic pipeline - Updated 19 tests across 4 test files to follow atomic pipeline pattern - Tests now explicitly chain: asset → PDF → Images → Frames - All 79 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

chazeon and others added 2 commits December 13, 2025 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add complete TOC/chapter extraction and XTC concatenation support#1

Add complete TOC/chapter extraction and XTC concatenation support#1
chazeon wants to merge 2 commits intomasterfrom
feature/toc-chapter-support-clean

chazeon commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

chazeon commented Dec 13, 2025

Changes

Pipeline Refactoring (Typst/Markdown → PDF → Images)

Chapter Extraction and Propagation

XTC Format Fix

Tests

Features

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant