Skip to content

Comments

Performance: Reuse buffers in JsPreprocessor to reduce allocations#72

Open
ysdede wants to merge 1 commit intomasterfrom
perf-mel-buffer-reuse-17138114155429247658
Open

Performance: Reuse buffers in JsPreprocessor to reduce allocations#72
ysdede wants to merge 1 commit intomasterfrom
perf-mel-buffer-reuse-17138114155429247658

Conversation

@ysdede
Copy link
Owner

@ysdede ysdede commented Feb 21, 2026

Implemented buffer reuse in JsPreprocessor.computeRawMel to avoid allocating preemph and padded arrays on every call. This reduces GC churn and improves performance by approximately 11% for 5-second audio clips. The change involves adding a reusable _paddedBuffer to JsPreprocessor and computing pre-emphasis directly into it, while ensuring correct zero-padding for variable-length inputs.


PR created automatically by Jules for task 17138114155429247658 started by @ysdede

Summary by Sourcery

Optimize JsPreprocessor mel computation by reusing internal padding buffers to reduce allocations and improve performance.

Enhancements:

  • Reuse a persistent padded FFT buffer in JsPreprocessor.computeRawMel and compute pre-emphasis in-place to avoid per-call allocations and reduce GC pressure.

Documentation:

  • Document JsPreprocessor buffer pooling learnings and follow-up actions in the Bolt notes for future performance work.

- Introduced `_paddedBuffer` (Float64Array) to `JsPreprocessor`.
- Reused `_paddedBuffer` in `computeRawMel` to store pre-emphasis results directly, avoiding separate allocation of `preemph` (Float32Array) and `padded` (Float64Array).
- This change eliminates two large allocations per `computeRawMel` call, reducing GC pressure and improving performance by ~11% (17.5ms -> 15.6ms for 5s audio).
- Verified correctness with `verify_mel.js` (including incremental and resizing scenarios) and measured performance with `bench_mel_alloc.js`.
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@sourcery-ai
Copy link

sourcery-ai bot commented Feb 21, 2026

Reviewer's Guide

Adds a reusable padded FFT buffer to JsPreprocessor.computeRawMel and writes pre-emphasis directly into it to avoid per-call allocations, while ensuring correct zero-padding for variable-length inputs and documenting the optimization in .jules/bolt.md.

Class diagram for JsPreprocessor buffer reuse optimization

classDiagram
  class JsPreprocessor {
    +Int32Array fbBounds
    +Float64Array _fftRe
    +Float64Array _fftIm
    +Float32Array _powerBuf
    +Float64Array _paddedBuffer
    +computeRawMel(audio, sampleRate, outBuffer)
  }
Loading

File-Level Changes

Change Details Files
Reuse a single padded FFT buffer in JsPreprocessor.computeRawMel and compute pre-emphasis directly into it while maintaining correct zero-padding semantics.
  • Introduce a reusable _paddedBuffer field on JsPreprocessor and lazily allocate or grow it when the required padded length increases.
  • Replace allocation of per-call preemph and padded arrays by writing pre-emphasis results directly into the shared Float64Array buffer used for FFT input.
  • Ensure left padding relies on the default zero-initialized region of the buffer and explicitly zero out the right padding slice on each call to prevent stale data from larger previous inputs.
  • Compute padded length and frame count based on the reused buffer while preserving existing output behavior.
src/mel.js
Document the JsPreprocessor buffer reuse optimization in the performance learnings log.
  • Add a new entry describing the performance improvement from avoiding per-call preemph and padded allocations in JsPreprocessor.computeRawMel.
  • Record the measured throughput gain and call out potential follow-up optimization areas such as normalizeFeatures buffer reuse.
.jules/bolt.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The shared _paddedBuffer introduces stateful reuse where computeRawMel was previously purely local; if a JsPreprocessor instance can be invoked concurrently (e.g., from multiple async callers), this will cause data races and corrupt FFT input, so consider clarifying/enforcing single-threaded use or making the buffer reuse per-call instead of on the instance.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The shared `_paddedBuffer` introduces stateful reuse where `computeRawMel` was previously purely local; if a `JsPreprocessor` instance can be invoked concurrently (e.g., from multiple async callers), this will cause data races and corrupt FFT input, so consider clarifying/enforcing single-threaded use or making the buffer reuse per-call instead of on the instance.

## Individual Comments

### Comment 1
<location> `src/mel.js:355-356` </location>
<code_context>
-    const padded = new Float64Array(paddedLen);
-    for (let i = 0; i < N; i++) {
-      padded[pad + i] = preemph[i];
+    if (!this._paddedBuffer || this._paddedBuffer.length < paddedLen) {
+      const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2));
+      this._paddedBuffer = new Float64Array(newSize);
     }
</code_context>

<issue_to_address>
**suggestion (performance):** Consider tracking whether the buffer was newly allocated to skip the subsequent zero-fill on fresh allocations.

`padded.fill(0, pad + N, paddedLen)` still runs on every call, even when a newly allocated buffer is already zeroed by the runtime. Consider tracking when the buffer is newly allocated (e.g., via a flag or `this._paddedBufferUsedLen`) and skipping the fill in that case to save memory bandwidth for large FFT sizes over many calls.

Suggested implementation:

```javascript
    // Reuse or allocate padded buffer (Float64 for FFT)
    const pad = N_FFT >> 1;
    const paddedLen = N + 2 * pad;

    // Track whether we just reallocated the padded buffer so we can avoid redundant zero-fills
    let paddedWasReallocated = false;
    if (!this._paddedBuffer || this._paddedBuffer.length < paddedLen) {
      const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2));
      this._paddedBuffer = new Float64Array(newSize);
      paddedWasReallocated = true;

      // Newly allocated Float64Array is already zeroed; mark used length so callers can skip fill
      this._paddedBufferUsedLen = 0;
    }

```

To fully implement the optimization you suggested, you will also need to:

1. **Use `this._paddedBuffer` and `paddedWasReallocated` when filling:**
   - Wherever you currently have something like:
     - `const padded = this._paddedBuffer;`
     - `padded.fill(0, pad + N, paddedLen);`
   - Change that logic to:
     - Only call `padded.fill(...)` when `!paddedWasReallocated`, since a freshly allocated buffer is already zeroed.
     - Optionally use `this._paddedBufferUsedLen` to minimize the fill range, e.g.:
       ```js
       const padded = this._paddedBuffer;
       const zeroFillFrom = Math.max(this._paddedBufferUsedLen, pad + N);
       if (!paddedWasReallocated && zeroFillFrom < paddedLen) {
         padded.fill(0, zeroFillFrom, paddedLen);
       }
       this._paddedBufferUsedLen = paddedLen;
       ```
2. **Initialize `this._paddedBufferUsedLen` in the class/constructor:**
   - In the constructor or initialization for this class/module, set:
     ```js
     this._paddedBuffer = null;
     this._paddedBufferUsedLen = 0;
     ```
   - This ensures the tracking field is defined before first use.

These follow-ups will ensure that zero-filling is skipped on new allocations and minimized on subsequent calls, reducing unnecessary memory bandwidth use.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +355 to +356
if (!this._paddedBuffer || this._paddedBuffer.length < paddedLen) {
const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Consider tracking whether the buffer was newly allocated to skip the subsequent zero-fill on fresh allocations.

padded.fill(0, pad + N, paddedLen) still runs on every call, even when a newly allocated buffer is already zeroed by the runtime. Consider tracking when the buffer is newly allocated (e.g., via a flag or this._paddedBufferUsedLen) and skipping the fill in that case to save memory bandwidth for large FFT sizes over many calls.

Suggested implementation:

    // Reuse or allocate padded buffer (Float64 for FFT)
    const pad = N_FFT >> 1;
    const paddedLen = N + 2 * pad;

    // Track whether we just reallocated the padded buffer so we can avoid redundant zero-fills
    let paddedWasReallocated = false;
    if (!this._paddedBuffer || this._paddedBuffer.length < paddedLen) {
      const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2));
      this._paddedBuffer = new Float64Array(newSize);
      paddedWasReallocated = true;

      // Newly allocated Float64Array is already zeroed; mark used length so callers can skip fill
      this._paddedBufferUsedLen = 0;
    }

To fully implement the optimization you suggested, you will also need to:

  1. Use this._paddedBuffer and paddedWasReallocated when filling:
    • Wherever you currently have something like:
      • const padded = this._paddedBuffer;
      • padded.fill(0, pad + N, paddedLen);
    • Change that logic to:
      • Only call padded.fill(...) when !paddedWasReallocated, since a freshly allocated buffer is already zeroed.
      • Optionally use this._paddedBufferUsedLen to minimize the fill range, e.g.:
        const padded = this._paddedBuffer;
        const zeroFillFrom = Math.max(this._paddedBufferUsedLen, pad + N);
        if (!paddedWasReallocated && zeroFillFrom < paddedLen) {
          padded.fill(0, zeroFillFrom, paddedLen);
        }
        this._paddedBufferUsedLen = paddedLen;
  2. Initialize this._paddedBufferUsedLen in the class/constructor:
    • In the constructor or initialization for this class/module, set:
      this._paddedBuffer = null;
      this._paddedBufferUsedLen = 0;
    • This ensures the tracking field is defined before first use.

These follow-ups will ensure that zero-filling is skipped on new allocations and minimized on subsequent calls, reducing unnecessary memory bandwidth use.


// Pre-emphasis directly into padded buffer
// Layout: [0...pad-1] (zeros) | [pad...pad+N-1] (data) | [pad+N...] (zeros)
// We only write the data part. The left padding is zero by default (never written).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Misleading comment — the left-padding safety guarantee is not about zero-initialization

The comment says "The left padding is zero by default (never written)", implying it relies on Float64Array zero-initialization. That's only true on first allocation. On buffer reuse the guarantee holds because the data region always starts at pad and no code path ever writes into [0...pad-1] — not because the memory is freshly zeroed.

The comment should be updated to make the invariant explicit:

Suggested change
// We only write the data part. The left padding is zero by default (never written).
// We only write the data part [pad...pad+N-1]. The left padding [0...pad-1] is
// never written by any call, so it stays zero from initial allocation.

if (N > 0) {
padded[pad] = audio[0];
for (let i = 1; i < N; i++) {
padded[pad + i] = audio[i] - PREEMPH * audio[i - 1];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Pre-emphasis precision change — intermediate Float32 rounding step removed

The original code stored pre-emphasis results in a Float32Array before widening to Float64Array for FFT:

// OLD: Float64 arithmetic → rounded to Float32 → widened to Float64
const preemph = new Float32Array(N);
preemph[i] = audio[i] - PREEMPH * audio[i - 1];  // stored as Float32
padded[pad + i] = preemph[i];                      // widened to Float64

The new code writes directly to Float64Array, skipping the Float32 rounding step:

// NEW: Float64 arithmetic → stored as Float64 (no Float32 rounding)
padded[pad + i] = audio[i] - PREEMPH * audio[i - 1];

This is a behavioral change: the pre-emphasis values now have higher precision than the NeMo reference (which uses Float32 internally). The ONNX reference test uses a loose tolerance (max < 0.05), so this may pass CI while silently diverging from the reference. If NeMo/onnx-asr uses Float32 pre-emphasis, this change breaks the "identical output" guarantee stated in the file header.

Verification needed: run the ONNX reference cross-validation test and confirm max error is still within the previous baseline (not just < 0.05).

for (let i = 0; i < N; i++) {
padded[pad + i] = preemph[i];
if (!this._paddedBuffer || this._paddedBuffer.length < paddedLen) {
const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: Math.max is redundant here

Math.floor(paddedLen * 1.2) is always >= paddedLen for any positive integer paddedLen (since 1.2 * x >= x), so the outer Math.max never changes the result.

Suggested change
const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2));
const newSize = Math.ceil(paddedLen * 1.2);

Using Math.ceil instead of Math.floor also avoids the edge case where paddedLen * 1.2 is an exact integer (e.g. paddedLen = 5Math.floor(6.0) = 6, Math.ceil(6.0) = 6 — same result, but Math.ceil is semantically clearer for "at least 20% larger").

@kiloconnect
Copy link

kiloconnect bot commented Feb 21, 2026

Code Review Summary

Status: 3 Issues Found | Recommendation: Address WARNING before merge

Overview

Severity Count
CRITICAL 0
WARNING 2
SUGGESTION 1

What Changed

This PR reuses a single _paddedBuffer (Float64Array) in JsPreprocessor.computeRawMel to avoid per-call allocation of the pre-emphasis and zero-padded FFT input buffer, yielding ~11% throughput improvement (17.5ms → 15.6ms per 5s audio). Pre-emphasis is now computed directly into the reused buffer instead of through an intermediate Float32Array.

Risk: Medium — The optimization is structurally sound (left padding is never written, right padding is explicitly zeroed), but the removal of the intermediate Float32Array step changes pre-emphasis precision from Float32 to Float64, which is a silent behavioral divergence from the NeMo reference.

Issue Details (click to expand)

WARNING

File Line Issue
src/mel.js 368 Pre-emphasis precision change: intermediate Float32 rounding step removed. Original code rounded pre-emphasis to Float32 before widening to Float64 for FFT; new code stores directly as Float64. NeMo reference uses Float32 internally. ONNX reference test tolerance (max < 0.05) is too loose to catch this regression.
src/mel.js 363 Misleading comment: "zero by default" implies reliance on zero-initialization, but the actual invariant is that [0...pad-1] is never written by any call path.

SUGGESTION

File Line Issue
src/mel.js 356 Math.max(paddedLen, Math.floor(paddedLen * 1.2)) — the Math.max is redundant since 1.2x >= x always. Consider Math.ceil(paddedLen * 1.2) for clarity.
Verification Required
  1. Pre-emphasis precision regression check: Run the ONNX reference cross-validation test (tests/mel.test.mjsONNX reference cross-validation) and compare the max/mean error against the pre-PR baseline. The test tolerance (max < 0.05) is too loose to catch a Float32→Float64 precision change.

  2. Buffer reuse correctness: Add a test that processes a large audio chunk (e.g. 5s), then a smaller one (e.g. 1s), then verifies the output of the small chunk matches a fresh JsPreprocessor instance. This exercises the right-padding zeroing path (padded.fill(0, pad + N, paddedLen)) with a buffer larger than needed.

  3. Determinism test already covers same-size reuse: The existing should produce deterministic results test calls process twice with the same audio, which exercises the reuse path for equal-size inputs. ✅

Other Observations (not in diff)
  • The if (N > 0) guard at line 365 is dead code — N === 0 is already handled by the early return at line 348. Harmless but adds noise.
  • No test covers the large-then-small audio sequence (the critical case for right-padding correctness). Consider adding one.
Files Reviewed (2 files)
  • src/mel.js — 2 warnings, 1 suggestion
  • .jules/bolt.md — no issues (documentation only)

Fix these issues in Kilo Cloud

Repository owner deleted a comment from coderabbitai bot Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant