Skip to content

streamsearch (multipart) can be made significantly faster with Buffer.indexOf #202

@davidje13

Description

@davidje13

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the feature has not already been requested

🚀 Feature Proposal

I recently ran a performance test with the original streamsearch implementation and an experimental change which makes heavy use of Buffer.indexOf (with custom code needed only for handling chunk boundaries), and found that (at least in more recent versions of Node.js), a Buffer.indexOf-based approach is up to 30× faster in some cases (and typically ~50–100% faster).

In real-world terms, this translates to handling multipart form data (excluding network access) ~2–3× faster in almost all cases (and I'm yet to find a situation where it's slower)

Motivation

Streamsearch was originally written to be fast, but it was implemented when Node.js' own Buffer.indexOf just did dumb character-by-character checks. Now, Buffer.indexOf is also fast (implementing MBH as well as various other algorithms internally), and being written in a lower-level language it has advantages which cannot be beaten in userspace. It even "wins" when there are lots of small chunks, which is potentially surprising because higher-level code can take advantage of sharing pre-calculated data, whereas indexOf must recalculate its references each time.

You can see more detail in this old thread: mscdex/streamsearch#2

Example

Test Needle adapted streamsearch (100kB data) streamsearch@1.1.0 (100kB data) Buffer.indexOf (100kB data) - for comparison adapted streamsearch (2kB batches) streamsearch@1.1.0 (2kB batches)
\r\n 4.0µs 169.3µs * 3.9µs * 6.8µs 200.9µs
\r\n--sep 4.0µs 95.4µs * 3.8µs * 6.9µs 97.8µs
realistic 4.5µs 15.7µs * 4.5µs * 8.0µs 21.0µs
repetitive prefixed 5.3µs 13.8µs * 5.3µs * 8.3µs 26.0µs
long repetitive prefixed 6.5µs 36.6µs * 6.2µs * 17.0µs 157.4µs
repetitive suffixed 4.6µs 15.2µs * 4.5µs * 10.0µs 24.9µs
long repetitive suffixed 14.0µs 33.6µs * 13.6µs * 76.7µs 181.5µs

The "adapted" column is using Buffer.indexOf as much as possible, along with a few other optimisations around how the lookbehind is handled by offloading some of the filtering and comparisons to Buffer methods. The full code is linked below.

The left 3 columns of results are when running in an artificial world where there is only a single 100kB chunk. A "dumb" Buffer.indexOf-loop implementation is shown here for comparison. The right 2 columns are a more realistic scenario where the same data is split into 2kB chunks, so "lookbehind" handling is required.

The above table is from running on a Mac running Node.js 24.

On linux (on a Pi 5 running Node.js 22), the results are similar:

Test Needle adapted streamsearch (100kB data) streamsearch@1.1.0 (100kB data) Buffer.indexOf (100kB data) adapted streamsearch (2kB batches) streamsearch@1.1.0 (2kB batches)
\r\n 12.9µs 623.5µs * 10.5µs * 24.4µs 632.6µs
\r\n--sep 10.9µs 180.5µs * 10.7µs * 24.6µs 184.7µs
realistic 11.5µs 27.0µs * 11.3µs * 26.8µs 37.4µs
repetitive prefixed 11.4µs 29.9µs * 11.0µs * 28.9µs 47.8µs
long repetitive prefixed 19.1µs 120.1µs * 18.0µs * 61.1µs 287.6µs
repetitive suffixed 12.9µs 28.2µs * 12.4µs * 45.1µs 47.5µs
long repetitive suffixed 64.8µs 141.0µs * 64.2µs * 255.9µs 296.8µs

The tests use randomised data so the numbers fluctuate a little.

Since running these tests, I've found further ways to tune the performance by changing the API slightly (specifically to break up the content and boundary callbacks)

You can find my forked code here (licensed under the same MIT license as the original):

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions