-
-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Prerequisites
- I have written a descriptive issue title
- I have searched existing issues to ensure the feature has not already been requested
🚀 Feature Proposal
I recently ran a performance test with the original streamsearch implementation and an experimental change which makes heavy use of Buffer.indexOf (with custom code needed only for handling chunk boundaries), and found that (at least in more recent versions of Node.js), a Buffer.indexOf-based approach is up to 30× faster in some cases (and typically ~50–100% faster).
In real-world terms, this translates to handling multipart form data (excluding network access) ~2–3× faster in almost all cases (and I'm yet to find a situation where it's slower)
Motivation
Streamsearch was originally written to be fast, but it was implemented when Node.js' own Buffer.indexOf just did dumb character-by-character checks. Now, Buffer.indexOf is also fast (implementing MBH as well as various other algorithms internally), and being written in a lower-level language it has advantages which cannot be beaten in userspace. It even "wins" when there are lots of small chunks, which is potentially surprising because higher-level code can take advantage of sharing pre-calculated data, whereas indexOf must recalculate its references each time.
You can see more detail in this old thread: mscdex/streamsearch#2
Example
| Test Needle | adapted streamsearch (100kB data) | streamsearch@1.1.0 (100kB data) | Buffer.indexOf (100kB data) - for comparison | adapted streamsearch (2kB batches) | streamsearch@1.1.0 (2kB batches) |
|---|---|---|---|---|---|
| \r\n | 4.0µs | 169.3µs | * 3.9µs | * 6.8µs | 200.9µs |
| \r\n--sep | 4.0µs | 95.4µs | * 3.8µs | * 6.9µs | 97.8µs |
| realistic | 4.5µs | 15.7µs | * 4.5µs | * 8.0µs | 21.0µs |
| repetitive prefixed | 5.3µs | 13.8µs | * 5.3µs | * 8.3µs | 26.0µs |
| long repetitive prefixed | 6.5µs | 36.6µs | * 6.2µs | * 17.0µs | 157.4µs |
| repetitive suffixed | 4.6µs | 15.2µs | * 4.5µs | * 10.0µs | 24.9µs |
| long repetitive suffixed | 14.0µs | 33.6µs | * 13.6µs | * 76.7µs | 181.5µs |
The "adapted" column is using Buffer.indexOf as much as possible, along with a few other optimisations around how the lookbehind is handled by offloading some of the filtering and comparisons to Buffer methods. The full code is linked below.
The left 3 columns of results are when running in an artificial world where there is only a single 100kB chunk. A "dumb" Buffer.indexOf-loop implementation is shown here for comparison. The right 2 columns are a more realistic scenario where the same data is split into 2kB chunks, so "lookbehind" handling is required.
The above table is from running on a Mac running Node.js 24.
On linux (on a Pi 5 running Node.js 22), the results are similar:
| Test Needle | adapted streamsearch (100kB data) | streamsearch@1.1.0 (100kB data) | Buffer.indexOf (100kB data) | adapted streamsearch (2kB batches) | streamsearch@1.1.0 (2kB batches) |
|---|---|---|---|---|---|
| \r\n | 12.9µs | 623.5µs | * 10.5µs | * 24.4µs | 632.6µs |
| \r\n--sep | 10.9µs | 180.5µs | * 10.7µs | * 24.6µs | 184.7µs |
| realistic | 11.5µs | 27.0µs | * 11.3µs | * 26.8µs | 37.4µs |
| repetitive prefixed | 11.4µs | 29.9µs | * 11.0µs | * 28.9µs | 47.8µs |
| long repetitive prefixed | 19.1µs | 120.1µs | * 18.0µs | * 61.1µs | 287.6µs |
| repetitive suffixed | 12.9µs | 28.2µs | * 12.4µs | * 45.1µs | 47.5µs |
| long repetitive suffixed | 64.8µs | 141.0µs | * 64.2µs | * 255.9µs | 296.8µs |
The tests use randomised data so the numbers fluctuate a little.
Since running these tests, I've found further ways to tune the performance by changing the API slightly (specifically to break up the content and boundary callbacks)
You can find my forked code here (licensed under the same MIT license as the original):
- still basically drop-in compatible with the original: https://github.com/davidje13/web-listener/blob/fb75eb7700630c13d3080786d2e9a5892b9686bf/src/forks/streamsearch/sbmh.mts
- after changing to use 2 separate callback arguments for additional speed: https://github.com/davidje13/web-listener/blob/ab8e57d2346a21f6cfca255d471074cbabe09df8/src/forks/streamsearch/sbmh.mts (gives an extra ~5–10% performance boost compared to the results above)