Performance: Reuse buffers in JsPreprocessor to reduce allocations#72
Performance: Reuse buffers in JsPreprocessor to reduce allocations#72
Conversation
- Introduced `_paddedBuffer` (Float64Array) to `JsPreprocessor`. - Reused `_paddedBuffer` in `computeRawMel` to store pre-emphasis results directly, avoiding separate allocation of `preemph` (Float32Array) and `padded` (Float64Array). - This change eliminates two large allocations per `computeRawMel` call, reducing GC pressure and improving performance by ~11% (17.5ms -> 15.6ms for 5s audio). - Verified correctness with `verify_mel.js` (including incremental and resizing scenarios) and measured performance with `bench_mel_alloc.js`.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideAdds a reusable padded FFT buffer to JsPreprocessor.computeRawMel and writes pre-emphasis directly into it to avoid per-call allocations, while ensuring correct zero-padding for variable-length inputs and documenting the optimization in .jules/bolt.md. Class diagram for JsPreprocessor buffer reuse optimizationclassDiagram
class JsPreprocessor {
+Int32Array fbBounds
+Float64Array _fftRe
+Float64Array _fftIm
+Float32Array _powerBuf
+Float64Array _paddedBuffer
+computeRawMel(audio, sampleRate, outBuffer)
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The shared
_paddedBufferintroduces stateful reuse wherecomputeRawMelwas previously purely local; if aJsPreprocessorinstance can be invoked concurrently (e.g., from multiple async callers), this will cause data races and corrupt FFT input, so consider clarifying/enforcing single-threaded use or making the buffer reuse per-call instead of on the instance.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The shared `_paddedBuffer` introduces stateful reuse where `computeRawMel` was previously purely local; if a `JsPreprocessor` instance can be invoked concurrently (e.g., from multiple async callers), this will cause data races and corrupt FFT input, so consider clarifying/enforcing single-threaded use or making the buffer reuse per-call instead of on the instance.
## Individual Comments
### Comment 1
<location> `src/mel.js:355-356` </location>
<code_context>
- const padded = new Float64Array(paddedLen);
- for (let i = 0; i < N; i++) {
- padded[pad + i] = preemph[i];
+ if (!this._paddedBuffer || this._paddedBuffer.length < paddedLen) {
+ const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2));
+ this._paddedBuffer = new Float64Array(newSize);
}
</code_context>
<issue_to_address>
**suggestion (performance):** Consider tracking whether the buffer was newly allocated to skip the subsequent zero-fill on fresh allocations.
`padded.fill(0, pad + N, paddedLen)` still runs on every call, even when a newly allocated buffer is already zeroed by the runtime. Consider tracking when the buffer is newly allocated (e.g., via a flag or `this._paddedBufferUsedLen`) and skipping the fill in that case to save memory bandwidth for large FFT sizes over many calls.
Suggested implementation:
```javascript
// Reuse or allocate padded buffer (Float64 for FFT)
const pad = N_FFT >> 1;
const paddedLen = N + 2 * pad;
// Track whether we just reallocated the padded buffer so we can avoid redundant zero-fills
let paddedWasReallocated = false;
if (!this._paddedBuffer || this._paddedBuffer.length < paddedLen) {
const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2));
this._paddedBuffer = new Float64Array(newSize);
paddedWasReallocated = true;
// Newly allocated Float64Array is already zeroed; mark used length so callers can skip fill
this._paddedBufferUsedLen = 0;
}
```
To fully implement the optimization you suggested, you will also need to:
1. **Use `this._paddedBuffer` and `paddedWasReallocated` when filling:**
- Wherever you currently have something like:
- `const padded = this._paddedBuffer;`
- `padded.fill(0, pad + N, paddedLen);`
- Change that logic to:
- Only call `padded.fill(...)` when `!paddedWasReallocated`, since a freshly allocated buffer is already zeroed.
- Optionally use `this._paddedBufferUsedLen` to minimize the fill range, e.g.:
```js
const padded = this._paddedBuffer;
const zeroFillFrom = Math.max(this._paddedBufferUsedLen, pad + N);
if (!paddedWasReallocated && zeroFillFrom < paddedLen) {
padded.fill(0, zeroFillFrom, paddedLen);
}
this._paddedBufferUsedLen = paddedLen;
```
2. **Initialize `this._paddedBufferUsedLen` in the class/constructor:**
- In the constructor or initialization for this class/module, set:
```js
this._paddedBuffer = null;
this._paddedBufferUsedLen = 0;
```
- This ensures the tracking field is defined before first use.
These follow-ups will ensure that zero-filling is skipped on new allocations and minimized on subsequent calls, reducing unnecessary memory bandwidth use.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if (!this._paddedBuffer || this._paddedBuffer.length < paddedLen) { | ||
| const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2)); |
There was a problem hiding this comment.
suggestion (performance): Consider tracking whether the buffer was newly allocated to skip the subsequent zero-fill on fresh allocations.
padded.fill(0, pad + N, paddedLen) still runs on every call, even when a newly allocated buffer is already zeroed by the runtime. Consider tracking when the buffer is newly allocated (e.g., via a flag or this._paddedBufferUsedLen) and skipping the fill in that case to save memory bandwidth for large FFT sizes over many calls.
Suggested implementation:
// Reuse or allocate padded buffer (Float64 for FFT)
const pad = N_FFT >> 1;
const paddedLen = N + 2 * pad;
// Track whether we just reallocated the padded buffer so we can avoid redundant zero-fills
let paddedWasReallocated = false;
if (!this._paddedBuffer || this._paddedBuffer.length < paddedLen) {
const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2));
this._paddedBuffer = new Float64Array(newSize);
paddedWasReallocated = true;
// Newly allocated Float64Array is already zeroed; mark used length so callers can skip fill
this._paddedBufferUsedLen = 0;
}To fully implement the optimization you suggested, you will also need to:
- Use
this._paddedBufferandpaddedWasReallocatedwhen filling:- Wherever you currently have something like:
const padded = this._paddedBuffer;padded.fill(0, pad + N, paddedLen);
- Change that logic to:
- Only call
padded.fill(...)when!paddedWasReallocated, since a freshly allocated buffer is already zeroed. - Optionally use
this._paddedBufferUsedLento minimize the fill range, e.g.:const padded = this._paddedBuffer; const zeroFillFrom = Math.max(this._paddedBufferUsedLen, pad + N); if (!paddedWasReallocated && zeroFillFrom < paddedLen) { padded.fill(0, zeroFillFrom, paddedLen); } this._paddedBufferUsedLen = paddedLen;
- Only call
- Wherever you currently have something like:
- Initialize
this._paddedBufferUsedLenin the class/constructor:- In the constructor or initialization for this class/module, set:
this._paddedBuffer = null; this._paddedBufferUsedLen = 0;
- This ensures the tracking field is defined before first use.
- In the constructor or initialization for this class/module, set:
These follow-ups will ensure that zero-filling is skipped on new allocations and minimized on subsequent calls, reducing unnecessary memory bandwidth use.
|
|
||
| // Pre-emphasis directly into padded buffer | ||
| // Layout: [0...pad-1] (zeros) | [pad...pad+N-1] (data) | [pad+N...] (zeros) | ||
| // We only write the data part. The left padding is zero by default (never written). |
There was a problem hiding this comment.
WARNING: Misleading comment — the left-padding safety guarantee is not about zero-initialization
The comment says "The left padding is zero by default (never written)", implying it relies on Float64Array zero-initialization. That's only true on first allocation. On buffer reuse the guarantee holds because the data region always starts at pad and no code path ever writes into [0...pad-1] — not because the memory is freshly zeroed.
The comment should be updated to make the invariant explicit:
| // We only write the data part. The left padding is zero by default (never written). | |
| // We only write the data part [pad...pad+N-1]. The left padding [0...pad-1] is | |
| // never written by any call, so it stays zero from initial allocation. |
| if (N > 0) { | ||
| padded[pad] = audio[0]; | ||
| for (let i = 1; i < N; i++) { | ||
| padded[pad + i] = audio[i] - PREEMPH * audio[i - 1]; |
There was a problem hiding this comment.
WARNING: Pre-emphasis precision change — intermediate Float32 rounding step removed
The original code stored pre-emphasis results in a Float32Array before widening to Float64Array for FFT:
// OLD: Float64 arithmetic → rounded to Float32 → widened to Float64
const preemph = new Float32Array(N);
preemph[i] = audio[i] - PREEMPH * audio[i - 1]; // stored as Float32
padded[pad + i] = preemph[i]; // widened to Float64The new code writes directly to Float64Array, skipping the Float32 rounding step:
// NEW: Float64 arithmetic → stored as Float64 (no Float32 rounding)
padded[pad + i] = audio[i] - PREEMPH * audio[i - 1];This is a behavioral change: the pre-emphasis values now have higher precision than the NeMo reference (which uses Float32 internally). The ONNX reference test uses a loose tolerance (max < 0.05), so this may pass CI while silently diverging from the reference. If NeMo/onnx-asr uses Float32 pre-emphasis, this change breaks the "identical output" guarantee stated in the file header.
Verification needed: run the ONNX reference cross-validation test and confirm max error is still within the previous baseline (not just < 0.05).
| for (let i = 0; i < N; i++) { | ||
| padded[pad + i] = preemph[i]; | ||
| if (!this._paddedBuffer || this._paddedBuffer.length < paddedLen) { | ||
| const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2)); |
There was a problem hiding this comment.
SUGGESTION: Math.max is redundant here
Math.floor(paddedLen * 1.2) is always >= paddedLen for any positive integer paddedLen (since 1.2 * x >= x), so the outer Math.max never changes the result.
| const newSize = Math.max(paddedLen, Math.floor(paddedLen * 1.2)); | |
| const newSize = Math.ceil(paddedLen * 1.2); |
Using Math.ceil instead of Math.floor also avoids the edge case where paddedLen * 1.2 is an exact integer (e.g. paddedLen = 5 → Math.floor(6.0) = 6, Math.ceil(6.0) = 6 — same result, but Math.ceil is semantically clearer for "at least 20% larger").
Code Review SummaryStatus: 3 Issues Found | Recommendation: Address WARNING before merge Overview
What ChangedThis PR reuses a single Risk: Medium — The optimization is structurally sound (left padding is never written, right padding is explicitly zeroed), but the removal of the intermediate Issue Details (click to expand)WARNING
SUGGESTION
Verification Required
Other Observations (not in diff)
Files Reviewed (2 files)
|
Implemented buffer reuse in
JsPreprocessor.computeRawMelto avoid allocatingpreemphandpaddedarrays on every call. This reduces GC churn and improves performance by approximately 11% for 5-second audio clips. The change involves adding a reusable_paddedBuffertoJsPreprocessorand computing pre-emphasis directly into it, while ensuring correct zero-padding for variable-length inputs.PR created automatically by Jules for task 17138114155429247658 started by @ysdede
Summary by Sourcery
Optimize JsPreprocessor mel computation by reusing internal padding buffers to reduce allocations and improve performance.
Enhancements:
Documentation: