Skip to content

Performance plugin#9728

Draft
Jermolene wants to merge 5 commits intomasterfrom
performance-plugin
Draft

Performance plugin#9728
Jermolene wants to merge 5 commits intomasterfrom
performance-plugin

Conversation

@Jermolene
Copy link
Member

@Jermolene Jermolene commented Mar 13, 2026

This PR was created with assistance from Claude (using Claude Opus 4.6). It was prompted by Shopify/liquid#2056 which improves Shopify liquid template rendering with 53% faster parse+render, 61% fewer allocations.

The Performance Plugin provides a framework for measuring the performance of TiddlyWiki's refresh cycle — the process that updates the display when tiddlers are modified.

The idea is to capture a realistic workload by recording store modifications while a user interacts with a wiki in the browser, and then replaying those modifications under Node.js where the refresh cycle can be precisely measured in isolation.

Motivation

An important motivation for this framework is to enable LLMs to iteratively optimise TiddlyWiki's performance. The workflow is:

  1. An LLM makes a change to the TiddlyWiki codebase (e.g. optimising a filter operator, caching a computation, or restructuring a widget's refresh logic)
  2. The LLM runs --perf-replay against a recorded timeline to measure the impact
  3. The LLM reads the JSON results file to determine whether the change improved, regressed, or had no effect on performance
  4. The LLM iterates: tries another approach, measures again, and converges on the best solution

This tight edit-measure-iterate loop works because --perf-replay runs entirely under Node.js with no browser required, produces machine-readable JSON output, and completes in seconds.

Initial Success

I used Claude to optimise a timeline from a test app of 150MB with 95K tiddlers and 20K tags. It came up with an improvement to the tag indexer that gives an impressively significant improvement:

Metric Before After Tag Indexer Improvement
Initial render 261.95ms 83.51ms 3.1x faster
Total refresh 2,496.25ms 1,338.84ms 1.9x faster (46%)
Mean refresh 416.04ms 223.14ms 1.9x faster
Max refresh 480.76ms 287.26ms 1.7x faster

The specific filter impact: [subfilter{$:/core/config/GlobalImportFilter}] went from 1,172ms (the #1 bottleneck) to effectively 0ms — it dropped out of the top 20 entirely. That single filter was consuming 47% of total refresh time.

The root cause was that TagSubIndexer.prototype.update() was setting this.index = null on every tiddler change, forcing a complete rebuild (iterating all 95k tiddlers) on the next tag lookup. The fix makes it incremental — just removing/adding the changed tiddler's tags — which is O(number of tags on the changed tiddler) instead of O(all tiddlers in the wiki).
4b04688

How the Performance Plugin Works

The framework has two parts:

1. Recording (Browser)

The plugin intercepts wiki.addTiddler() and wiki.deleteTiddler() to capture every store modification as it happens. Each operation is recorded with:

  • A sequence number and high-resolution timestamp
  • The full tiddler fields (so the exact state can be recreated)
  • A batch identifier that tracks TiddlyWiki's change batching via $tw.utils.nextTick()

The batch tracking is important because TiddlyWiki groups multiple store changes that occur in the same tick into a single refresh cycle. The recorder preserves these batch boundaries so that playback triggers the same pattern of refreshes.

2. Playback (Node.js)

The --perf-replay command loads a wiki and builds the full widget tree using TiddlyWiki's $tw.fakeDocument — the lightweight DOM implementation used for server-side rendering. It then replays the recorded timeline batch by batch, calling widgetNode.refresh(changedTiddlers) after each batch and measuring how long it takes.

This means we are measuring TiddlyWiki's own refresh logic (widget tree traversal, filter evaluation, DOM diffing) in isolation from browser layout and paint. This is intentional — it lets us identify performance bottlenecks within TiddlyWiki itself, independent of which browser is being used.

Why Store-Level Recording?

An alternative would be to record DOM events (clicks, keystrokes) and replay them in a headless browser. Store-level recording was chosen instead because:

  • The refresh cycle responds to store changes, not DOM events — store modifications are the natural input
  • Store changes are fully deterministic and reproducible
  • No DOM dependency means playback works in pure Node.js with no headless browser to install
  • A headless browser would add its own overhead, making measurements less precise

Recording

  1. Include this plugin in your wiki
  2. Open the Control Panel and find the "Performance Testing Recorder" tab
  3. Click "Start Recording"
  4. Interact with the wiki — open tiddlers, edit, type, navigate, switch tabs
  5. Click "Stop Recording"
  6. Download the timeline.json file

Draft Coalescing

When editing a tiddler, TiddlyWiki writes to draft tiddlers on every keystroke. By default, the recorder coalesces rapid draft updates within the same batch, keeping only the last update. This produces a more compact timeline that focuses on the refresh-relevant changes.

Uncheck "Coalesce rapid draft updates" to record every individual keystroke. This is useful when you specifically want to measure the performance impact of rapid typing.

Playback

tiddlywiki editions/performance --load mywiki.html --perf-replay timeline.json

Or from any edition that includes this plugin:

tiddlywiki myedition --perf-replay timeline.json

Playback runs at full speed with no delays between batches. The recorded timestamps are preserved in the timeline for reference but are not used for pacing.

What Gets Measured

  • Initial render time — the time to build and render the full widget tree from scratch
  • Refresh time per batch — the time widgetNode.refresh(changedTiddlers) takes for each batch of store modifications
  • Filter execution — individual filter timings and invocation counts, showing which filters are the most expensive
  • Statistical summary — mean, P50, P95, P99, and maximum refresh times across all batches

Output

The command produces two forms of output:

Text Report (stdout)

A human-readable table printed to the console showing per-batch timings, a summary with percentile statistics, and a breakdown of the most expensive filter executions.

JSON Results File

A <timeline-name>-results.json file is written alongside the input timeline. This is the primary output for automated consumption. The file contains:

{
  "wiki": {
    "tiddlerCount": 2076
  },
  "timeline": {
    "operations": 156,
    "batches": 42
  },
  "initialRender": 55.46,
  "summary": {
    "totalRefreshTime": 234.5,
    "meanRefresh": 5.58,
    "p50": 4.12,
    "p95": 18.7,
    "p99": 31.2,
    "maxRefresh": 31.2,
    "totalFilterInvocations": 4821
  },
  "batches": [
    {
      "batch": 1,
      "ops": 1,
      "changed": 1,
      "refreshMs": 12.3,
      "filters": 293,
      "tiddlers": ["$:/StoryList"]
    }
  ],
  "topFilters": [
    {
      "name": "filter: [subfilter{$:/core/config/GlobalImportFilter}]",
      "time": 5.65,
      "invocations": 5
    }
  ]
}

All times are in milliseconds. The key fields for automated analysis:

  • summary.totalRefreshTime — the single most important number: total time spent in refresh across all batches
  • summary.meanRefresh — average refresh time per batch
  • summary.p95 / summary.p99 — tail latency indicators
  • initialRender — time to build the widget tree from scratch (measures startup cost)
  • batches[].refreshMs — per-batch breakdown, useful for identifying which user actions are expensive
  • topFilters[] — the most expensive filters by total execution time, useful for identifying optimisation targets

Example: LLM Optimisation Workflow

An LLM optimising TiddlyWiki performance would follow this pattern:

Step 1: Establish baseline

node ./tiddlywiki.js editions/performance --load mywiki.html --perf-replay timeline.json

Read timeline-results.json and note the baseline summary.totalRefreshTime.

Step 2: Make a change

Edit a source file (e.g. optimise a filter operator in core/modules/filters/).

Step 3: Measure impact

Run the same --perf-replay command again and read the new timeline-results.json.

Step 4: Compare

Compare summary.totalRefreshTime and summary.p95 between baseline and new results. If improved, keep the change. If regressed, revert and try a different approach.

Step 5: Iterate

Repeat steps 2-4 until the target metric is optimised.

The JSON results file makes step 4 straightforward — an LLM can read two JSON files and compare numeric fields directly without parsing tabular text output.

Timeline Format

The timeline is a JSON array of operations:

[
  {
    "seq": 0,
    "t": 123.45,
    "batch": 0,
    "op": "add",
    "title": "$:/StoryList",
    "isDraft": false,
    "fields": {
      "title": "$:/StoryList",
      "list": "GettingStarted",
      "text": ""
    }
  }
]
  • seq — sequential operation number
  • t — milliseconds since recording started
  • batch — batch identifier (operations in the same batch trigger a single refresh)
  • op"add" or "delete"
  • isDraft — whether this is a draft tiddler (used for coalescing)
  • fields — complete tiddler fields (null for delete operations)

For my test app with 100K tiddlers and 20K tags, the tag indexer improvement gives a significant improvement:

| Metric | Before | After Tag Indexer | Improvement |
|--------|--------|-------------------|-------------|
| **Initial render** | 261.95ms | 83.51ms | **3.1x faster** |
| **Total refresh** | 2,496.25ms | 1,338.84ms | **1.9x faster (46%)** |
| **Mean refresh** | 416.04ms | 223.14ms | **1.9x faster** |
| **Max refresh** | 480.76ms | 287.26ms | **1.7x faster** |

The specific filter impact: `[subfilter{$:/core/config/GlobalImportFilter}]` went from **1,172ms** (the #1 bottleneck) to effectively **0ms** — it dropped out of the top 20 entirely. That single filter was consuming 47% of total refresh time.

The root cause was that `TagSubIndexer.prototype.update()` was setting `this.index = null` on every tiddler change, forcing a complete rebuild (iterating all 95k tiddlers) on the next tag lookup. The fix makes it incremental — just removing/adding the changed tiddler's tags — which is O(number of tags on the changed tiddler) instead of O(all tiddlers in the wiki).
@netlify
Copy link

netlify bot commented Mar 13, 2026

Deploy Preview for tiddlywiki-previews ready!

Name Link
🔨 Latest commit 1e06098
🔍 Latest deploy log https://app.netlify.com/projects/tiddlywiki-previews/deploys/69b6bd72a39d980008f8201e
😎 Deploy Preview https://deploy-preview-9728--tiddlywiki-previews.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

Confirmed: Jermolene has already signed the Contributor License Agreement (see contributing.md)

@github-actions
Copy link

github-actions bot commented Mar 13, 2026

📊 Build Size Comparison: empty.html

Branch Size
Base (master) 2487.2 KB
PR 2496.6 KB

Diff: ⬆️ Increase: +9.3 KB


⚠️ Change Note Status

This PR appears to contain code changes but doesn't include a change note.

Please add a change note by creating a .tid file in editions/tw5.com/tiddlers/releasenotes/<version>/

📚 Documentation: Release Notes and Changes

💡 Note: If this is a documentation-only change, you can ignore this message.

@pmario
Copy link
Member

pmario commented Mar 14, 2026

That's very interesting. I did also let Claude play with performance for the following elements.

  • finding transclusion backlinks
  • findDraft tiddler
  • getMissingTitles
  • getTiddlerBacklinks
  • getOrphanTitles

Making it measurable was the hardest thing to do. I did create a little test runner, that could be activated with tiddlywiki editions/test --test and also starting the test-index.html file in the browser. The main problem in the browser is the max resulution is 1ms .

Before testing it did create 10000 tiddlers with elements that should be found.

I will use this plugin to see, if it comes up with the same results and or solutions.
I did not upload the Drafts yet, since I need to check the auto-created test tiddlers. They have been created with Claude and I did not manually check them. So I am not sure yet, "what is measured" ;)

-- Cool stuff!!

@pmario
Copy link
Member

pmario commented Mar 14, 2026

For getTiddlerBacklinks I got a 6x improvement. ... with 10000 tids, 10% of them link each other, 20% have no links at all, 10% link targets are not existent, links per tiddler 1 - 5 random, 2 warmup runs, 5 measurement runs. ...

For getOrphans it was between 4x-19x .. may be still a warm-up problem. -- will test with this plugin again

@pmario pmario added the ⟲ admin-review A label for admins, to review the issue again label Mar 14, 2026
These ones stemming from a wiki with 80K tiddlers and 5K tags, and a total wiki size of 150MB.

> The biggest win is the P50 (median) refresh dropping from 124ms to 46ms — a 63% improvement
@pmario
Copy link
Member

pmario commented Mar 14, 2026

@Jermolene ... Do you have something like a summary report, what the different changes do? IMO it would be nice, to know what actually was going on.

@pmario
Copy link
Member

pmario commented Mar 14, 2026

@Jermolene ... I let Claude find out, what the different optimisations do.

But there is still one problem left. This PR does not contain any info, how to reproduce any test runs.

So IMO there needs to be some info, how to create your test wiki, and a replay-receipt, that you used for testing. IMO the info should be somewhere in ./edtions/test edition.

@pmario
Copy link
Member

pmario commented Mar 14, 2026

@Jermolene ... Do you think we can combine some concepts from my draft PR: FindDraft-performcance-improvement #9729 with this plugin?

Eg: Some documentation, how to reproduce the results on a different machine in the test-edition.

The main problem I do have with my approach is, that the benchmark is only valid for one TW version.

The Jasmine test $:/tags/test-spec in test-finddraft-benchmark.js

It contains the following code snippet. Which is sub optimal

// only run for v5.5.0 and v5.5.0-prerelease
// TODO: Adjust the version check! Currently for the draft it is v5.4.0-pre..

if($tw.version.indexOf("5.4.0") === 0) {

@linonetwo
Copy link
Contributor

Chrome MCP have https://github.com/ChromeDevTools/chrome-devtools-mcp/blob/main/docs/tool-reference.md#performance

Could you use that in the debug workflow instead of using performance plugin? One benefit of performance plugin is it produce less data, so less token will be occupide. While Chrome performance Flamechart will use tons of token, while it gives LLM more insight.

@pmario
Copy link
Member

pmario commented Mar 14, 2026

hmmm, @linonetwo ... IMO the advantage is, this plugin runs without a browser. On my system I do have no Chrome installed. There is Edge, because it is there. Only use it for free CoPilot chat.

I am 100% FireFox ;)

@pmario
Copy link
Member

pmario commented Mar 14, 2026

@Jermolene .. I did update the test runner benchmark code at PR: GetOrphanTitles-performance-improvement #9730

Now it runs with Jasmine, command line for windows, and the -benchmark-core.js now can be copy / pasted into any wiki browser console. So manual tests would be easy too.

@pmario
Copy link
Member

pmario commented Mar 14, 2026

@Jermolene ... There seems to be a bug in the tagIndexer.js

To reproduce with VSCode / GitLens plugin

  • Check out master
  • GitLens: Open branch performance-plugin
  • Open commit named: Update tag-indexer.js
    • tag-indexer.js -> right click
  • Apply changes from tag-indexer.js from performance-plugin branch
    • Make sure they are applied.
  • node tiddlywiki.js editions/tw5.com-server --listen
  • open: http://localhost:8080/#Acknowledgements
  • Click About tag
  • Drag License tiddler to top -> Problem
    grafik
  • About list field is updated -> OK
  • But UI still shows old list
    grafik

@Jermolene
Copy link
Member Author

Thanks @pmario, @linonetwo.

Making it measurable was the hardest thing to do. I did create a little test runner, that could be activated with tiddlywiki editions/test --test and also starting the test-index.html file in the browser.

Making things measurable is the cornerstone of this technique. My focus so far has been measuring interactive performance, but I would see the scope of the plugin as being broad enough to accommodate any kind of performance measurement.

The main problem in the browser is the max resulution is 1ms .

Can you not use performance.now() for higher resolution?

@Jermolene ... Do you have something like a summary report, what the different changes do? IMO it would be nice, to know what actually was going on.

I will update the OP but my plan is to remove the example optimisations from this PR. We need to make each optimisation a separate PR so that we can easily git bisect problems. This PR is about establishing the shared infrastructure we need for performance measurement.

But there is still one problem left. This PR does not contain any info, how to reproduce any test runs.

That is because the two wikis that I used for the experiments above are confidential to two of my clients who are working with enormous wikis. At this size performance is critical, but paradoxically easier to measure and therefore optimise.

The value of using client data is that it is reflects real world usage, complementing the synthetic test data that we will also need. It also benefits the clients while preserving their privacy.

So IMO there needs to be some info, how to create your test wiki, and a replay-receipt, that you used for testing. IMO the info should be somewhere in ./edtions/test edition.

We will certainly need to do that, but we also need to accommodate people who want to be able to optimise performance while using their private data.

Downstream a bit, we can imagine TiddlyWiki end users being able to optimise their copy of TiddlyWiki for the best possible performance with their own, real data.

@Jermolene ... Do you think we can combine some concepts from my draft PR: FindDraft-performcance-improvement #9729 with this plugin?

As noted above, each optimisation must be a separate PR. The scope of this PR is the infrastructure that we can share, which certainly includes the kind of test runner approach in #9729.

Chrome MCP have https://github.com/ChromeDevTools/chrome-devtools-mcp/blob/main/docs/tool-reference.md#performance

Could you use that in the debug workflow instead of using performance plugin? One benefit of performance plugin is it produce less data, so less token will be occupide. While Chrome performance Flamechart will use tons of token, while it gives LLM more insight.

There is certainly a role for that approach, but it is orthogonal to the problem addressed here of automating interactive performance testing.

@Jermolene ... There seems to be a bug in the tagIndexer.js

Perhaps it is more helpful to think of it as a bug in our test suite.

@pmario
Copy link
Member

pmario commented Mar 14, 2026

@Jermolene ... Do you think we can combine some concepts from my draft PR: FindDraft-performcance-improvement #9729 with this plugin?

As noted above, each optimisation must be a separate PR. ...

Sure that's why I did create several optimisation drafts which contain very similar info in the OP. But the test code and the test tiddlers it creates are different.

... The scope of this PR is the infrastructure that we can share, which certainly includes the kind of test runner approach in #9729.

OK - That's what I was thinking about. I do like the idea, to have some documentation, and "how to test" information in ./editions/test

The main difficulty I faced, is that optimisation benchmarks are good to proof, that a code change works.

But once it is merged, that test is "outdated" and only wastes time on Netlify's side.

In my code I did include a version check -- But I think that's a bit hacky.

Just some thoughts.
Mario

Allows us to test startup performance in a real browser or Playwright
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⟲ admin-review A label for admins, to review the issue again

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants