Add complete TOC/chapter extraction and XTC concatenation support#1
Open
Add complete TOC/chapter extraction and XTC concatenation support#1
Conversation
This commit implements a full pipeline for extracting table of contents from Typst, Markdown, and PDF files, converting them to XTC chapters, and preserving chapters when concatenating XTC files. ## Changes ### Pipeline Refactoring (Typst/Markdown → PDF → Images) - Switch TypstFileAsset to compile to PDF instead of PNG - Switch MarkdownFileAsset to compile to PDF instead of PNG - Both now delegate to PDFAsset for rendering and TOC extraction - Enables consistent TOC extraction across all document types ### Chapter Extraction and Propagation - Extract chapters from XTC files when loading (XTContainerAsset) - Attach chapter metadata to frames for downstream processing - Update extract_chapters_from_toc() to handle both: - TOC metadata (from PDF/Typst/Markdown headings) - Chapter metadata (from existing XTC files) - Automatically adjust page numbers when concatenating ### XTC Format Fix - Ensure metadata section is written when chapters are present - Chapter count is stored in metadata, so metadata must exist ### Tests - Add tests/test_toc_to_xtc.py: End-to-end TOC → XTC pipeline tests - Add tests/test_xtc_concat_chapters.py: Chapter preservation tests - All 79 tests pass ## Features ✅ TOC extraction from Typst/Markdown/PDF to XTC chapters ✅ Chapter preservation when concatenating XTC files ✅ Page number adjustment for concatenated chapters ✅ Mixed source support (XTC + PDF/Typst/Markdown) ✅ Page subsetting still works with new PDF pipeline 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- TypstFileAsset now returns PDFAsset instead of list[ImageAsset] - MarkdownFileAsset now returns PDFAsset instead of list[ImageAsset] - Assets are now truly atomic - each converts to exactly one next stage - CLI stack automatically chains conversions for complete pipeline flow - Enables future parallelization of asset conversions Fix XTC chapter reading bug - XTCReader._read_chapters was called with has_chapters (0/1) instead of chapter_count - Now correctly reads chapter_count from metadata section - Fixes bug where only 1 chapter was read regardless of actual count Update tests for atomic pipeline - Updated 19 tests across 4 test files to follow atomic pipeline pattern - Tests now explicitly chain: asset → PDF → Images → Frames - All 79 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit implements a full pipeline for extracting table of contents from Typst, Markdown, and PDF files, converting them to XTC chapters, and preserving chapters when concatenating XTC files.
Changes
Pipeline Refactoring (Typst/Markdown → PDF → Images)
Chapter Extraction and Propagation
XTC Format Fix
Tests
Features
✅ TOC extraction from Typst/Markdown/PDF to XTC chapters ✅ Chapter preservation when concatenating XTC files ✅ Page number adjustment for concatenated chapters ✅ Mixed source support (XTC + PDF/Typst/Markdown)
✅ Page subsetting still works with new PDF pipeline
🤖 Generated with Claude Code