Skip to content

Comments

perf: Cache redundant work in transform phase#40

Open
danceratopz wants to merge 14 commits intoSamWilsn:masterfrom
danceratopz:cache-optimizations
Open

perf: Cache redundant work in transform phase#40
danceratopz wants to merge 14 commits intoSamWilsn:masterfrom
danceratopz:cache-optimizations

Conversation

@danceratopz
Copy link

@danceratopz danceratopz commented Feb 24, 2026

Requires #37

Summary

Add module-level and instance-level caching to seven hot paths identified via profiling. Each optimization eliminates redundant work that was repeated per-document or per-node during the build and transform phases.

What changed

  1. Cache entry_points() in HTMLVisitor — Module-level _get_html_entry_points() replaces per-instance entry_points(group="docc.plugins.html") calls.
  2. Cache entry_points() in Loader — Module-level _get_plugin_entry_points() replaces per-instance entry_points(group="docc.plugins") calls.
  3. Cache dataclasses.fields() in PythonNode — Class-level _fields_cache ClassVar replaces per-call fields(self) in children and replace_child.
  4. Cache _BoundsVisitor results in VerbatimVisitor — Instance-level _bounds_cache eliminates double tree traversal in enter()/exit().
  5. Cache file lines in TextSource.line() — Lazy _lines_cache stores file content on first access, eliminating re-reads per line() call.
  6. Cache Jinja2 environments — Module-level _get_jinja_env() (HTML) and _get_listing_env() (listing) replace per-render Environment(...) construction.
  7. Share loaded renderers across HTMLVisitor instances — Module-level _LOADED_RENDERERS dict replaces per-instance self.renderers = {}, so EntryPoint.load() runs at most once per node type.

Why

The transform phase dominates runtime (~87% of wall time). Profiling showed that entry_points(), dataclasses.fields(), Jinja2 Environment construction, _BoundsVisitor traversal, file I/O in TextSource.line(), and EntryPoint.load() were all called thousands of times with identical arguments across documents. Caching these at the appropriate scope (module, class, or instance) eliminates the redundant work.

Test coverage

Each optimization includes three categories of tests:

  • Behavioral tests — Verify the cached code path produces the same results as before.
  • Call-count / spy tests — Use unittest.mock.patch to verify the expensive call happens only once when multiple consumers access the cache.
  • Cache-keying tests — Verify the cache data structure contains entries with expected keys and value types.

Benchmark

Command:

hyperfine --warmup=1 --runs=5 --show-output 'taskset -c 0 uv run docc'

Individual optimizations

# Optimization Mean Range Speedup
Baseline 70.1s 67.9–74.4s
1 HTML entrypoints cache 24.0s 17.0–27.3s 2.9x
2 Loader entrypoints cache 48.0s 39.3–51.0s 1.5x
3 Dataclasses fields cache 49.4s 45.1–50.7s 1.4x
4 BoundsVisitor cache 50.2s 49.4–50.8s 1.4x
5 TextSource line cache 48.9s 48.3–49.4s 1.4x
6 Jinja2 environment cache 35.9s 29.6–38.2s 2.0x
7 Shared renderer cache 50.3s 49.5–51.4s 1.4x

Combined result

Variant Mean σ Range Speedup
Baseline 70.1s 2.6s 67.9–74.4s
All 7 cache opts 15.6s 0.4s 15.2–16.2s 4.5x

Per-phase breakdown (combined)

Phase Baseline (median) Optimized (median) Change
Built 6.5s 6.6s
Transformed 61s 7s -88%
Wrote 0.6s 0.6s

- Use explicit patterns instead of recursive ** globs.
- Setuptools doesn't reliably include files with ** in sdist builds.
- Add dedicated test extras with pytest and pytest-cov.
- Add test extras to both test and type environments so
  pytest and pytest-cov are available alongside lint deps.
- Configure pytest to run with coverage enabled.
- Set minimum coverage threshold to 80%.
- Exclude common non-testable patterns from coverage.
- Add tests for CLI, settings, context, and document modules.
- Add tests for plugin loader and transform pipeline.
- Add tests for HTML, mistletoe, and verbatim plugins.
- Add end-to-end HTML rendering pipeline tests.
- Add tests for Python CST parsing and node types.
- Add tests for references, search, and resources plugins.
- Add integration tests for end-to-end workflows.
- Add behavior-level pipeline contract tests.
- Achieve 90% code coverage.
The test_enter_returning_tag_pushes_and_traverses test was directly
assigning a fake renderer into visitor.renderers, which with the shared
_LOADED_RENDERERS cache now persists across all subsequent HTMLVisitor
instances. This caused test_definition_to_html_output to use the fake
ListNode renderer instead of the real one, producing empty HTML output.

Wrap the fake renderer injection in patch.dict() so it is automatically
cleaned up after the test completes.
@danceratopz
Copy link
Author

Ah nice, I didn't see that fc0d5eb had already landed, will rebase on main tomorrow 🥱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant