Performance: Optimize AudioSegmentProcessor stats calculation#164
Performance: Optimize AudioSegmentProcessor stats calculation#164
Conversation
Cached `speechStats` aggregation in `AudioSegmentProcessor.updateStats` to avoid redundant O(N) calculations every audio chunk. Performance benchmark shows ~48% improvement in `processAudioData` execution time. Verified with existing unit tests.
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds a cached speech stats summary to AudioSegmentProcessor to avoid recomputing averages on every audio chunk, invalidating the cache only when the underlying speech stats history changes. Class diagram for cachedSpeechSummary in AudioSegmentProcessorclassDiagram
class AudioSegmentProcessor {
- ProcessorState state
- ProcessorOptions options
+ processAudioData(audioChunk)
+ updateStats()
- average(values)
- resetState()
}
class ProcessorState {
+ number[] silenceDurations
+ number[] silenceEnergies
+ SegmentStats[] speechStats
+ SegmentStats[] silenceStats
+ StatsSummary cachedSpeechSummary
+ CurrentStats currentStats
+ number segmentCounter
+ number noiseFloor
}
class StatsSummary {
+ number avgDuration
+ number avgEnergy
+ number avgEnergyIntegral
}
class SegmentStats {
+ number duration
+ number avgEnergy
+ number energyIntegral
}
class CurrentStats {
+ StatsSummary speech
+ StatsSummary silence
}
AudioSegmentProcessor --> ProcessorState : uses
ProcessorState "*" --> "*" SegmentStats : aggregates
ProcessorState --> StatsSummary : caches
ProcessorState --> CurrentStats : holds
CurrentStats --> StatsSummary : speech
CurrentStats --> StatsSummary : silence
Flow diagram for speech stats cache invalidation and usageflowchart TD
A[start processAudioData for new audioChunk] --> B[update speech detection state]
B --> C{didSpeechSegmentEnd?}
C -- no --> D[call updateStats]
C -- yes --> E[compute segment duration and avgEnergy]
E --> F[push new SegmentStats into speechStats]
F --> G[set cachedSpeechSummary to null]
G --> H{speechStats.length > maxHistoryLength?}
H -- yes --> I[shift oldest SegmentStats from speechStats]
I --> J[set cachedSpeechSummary to null]
H -- no --> J
J --> D[call updateStats]
subgraph updateStats
D --> K[compute silence stats]
K --> L{cachedSpeechSummary is not null?}
L -- yes --> M[set stats.speech from cachedSpeechSummary]
L -- no --> N[map and average speechStats to compute StatsSummary]
N --> O[set cachedSpeechSummary to computed StatsSummary]
M --> P[set currentStats to stats]
O --> P[set currentStats to stats]
P --> Q[end updateStats]
end
Q --> R[end processAudioData for this audioChunk]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
📝 WalkthroughWalkthroughInternal caching optimization added to the AudioSegmentProcessor to avoid redundant speech summary computations. A cachedSpeechSummary field stores previously computed statistics, with automatic invalidation on speech segment completion or history length changes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider centralizing
speechStatsmutations (e.g., via helper methods) so thatcachedSpeechSummaryinvalidation is guaranteed on every write path and can’t be missed in future changes. - In
updateStats, prefer an explicit null check (this.state.cachedSpeechSummary !== null) instead of a truthiness check to make the cache-use condition clearer and robust against future type changes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider centralizing `speechStats` mutations (e.g., via helper methods) so that `cachedSpeechSummary` invalidation is guaranteed on every write path and can’t be missed in future changes.
- In `updateStats`, prefer an explicit null check (`this.state.cachedSpeechSummary !== null`) instead of a truthiness check to make the cache-use condition clearer and robust against future type changes.
## Individual Comments
### Comment 1
<location> `src/lib/audio/AudioSegmentProcessor.ts:474-482` </location>
<code_context>
}
- if (this.state.speechStats.length > 0) {
+ if (this.state.cachedSpeechSummary) {
+ stats.speech = this.state.cachedSpeechSummary;
+ } else if (this.state.speechStats.length > 0) {
stats.speech = {
avgDuration: this.average(this.state.speechStats.map(s => s.duration)),
avgEnergy: this.average(this.state.speechStats.map(s => s.avgEnergy)),
avgEnergyIntegral: this.average(this.state.speechStats.map(s => s.energyIntegral))
};
+ this.state.cachedSpeechSummary = stats.speech;
}
</code_context>
<issue_to_address>
**issue (bug_risk):** Cache usage relies on all mutation sites invalidating `cachedSpeechSummary`, which could lead to stale stats if any path is missed.
This depends on every `speechStats` mutation also clearing `cachedSpeechSummary`, which we currently do when appending, truncating, and in `resetState`. Any future code that replaces or clears `speechStats` must also remember to invalidate the cache, or consumers will see stale data. To make this safer, consider centralizing all `speechStats` mutations in helper(s) that always invalidate the cache, or switch to a version-based approach where the cache is tied to a `speechStats` version counter rather than manual invalidation at each call site.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if (this.state.cachedSpeechSummary) { | ||
| stats.speech = this.state.cachedSpeechSummary; | ||
| } else if (this.state.speechStats.length > 0) { | ||
| stats.speech = { | ||
| avgDuration: this.average(this.state.speechStats.map(s => s.duration)), | ||
| avgEnergy: this.average(this.state.speechStats.map(s => s.avgEnergy)), | ||
| avgEnergyIntegral: this.average(this.state.speechStats.map(s => s.energyIntegral)) | ||
| }; | ||
| this.state.cachedSpeechSummary = stats.speech; |
There was a problem hiding this comment.
issue (bug_risk): Cache usage relies on all mutation sites invalidating cachedSpeechSummary, which could lead to stale stats if any path is missed.
This depends on every speechStats mutation also clearing cachedSpeechSummary, which we currently do when appending, truncating, and in resetState. Any future code that replaces or clears speechStats must also remember to invalidate the cache, or consumers will see stale data. To make this safer, consider centralizing all speechStats mutations in helper(s) that always invalidate the cache, or switch to a version-based approach where the cache is tied to a speechStats version counter rather than manual invalidation at each call site.
Code Review SummaryStatus: No New Issues Found | Recommendation: Merge The PR implements a performance optimization by caching speech statistics summary to avoid recalculating averages on every audio chunk. The implementation is correct:
An existing review (Sourcery) has already flagged a maintenance concern about centralizing speechStats mutations to prevent future cache invalidation bugs. This is a valid suggestion but not a blocker. Files Reviewed (1 file)
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/lib/audio/AudioSegmentProcessor.ts (2)
466-472:silenceStatspath still allocates 3 temporary arrays per chunk.The optimisation was applied only to
speechStats. ThesilenceStatsblock (lines 466–472) retains the same.map()× 3 pattern on everyupdateStats()call. ApplyingcachedSilenceSummarysymmetrically would complete the optimisation and keep both paths consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/audio/AudioSegmentProcessor.ts` around lines 466 - 472, The silenceStats branch still creates three temporary arrays via .map on every update; mirror the speechStats optimization by adding a cachedSilenceSummary accumulator (similar to cachedSpeechSummary) updated inside updateStats(), and replace the current stats.silence calculation to compute avgDuration, avgEnergy and avgEnergyIntegral from that cachedSilenceSummary instead of mapping silenceStats each time; update references to this.state.silenceStats, stats.silence, cachedSilenceSummary and the average(...) helper so the silence path no longer allocates the three intermediate arrays.
269-273: Line 273 is a redundant null assignment.
cachedSpeechSummaryis alreadynullfrom the unconditional assignment on line 269 — theshift()branch is nested entirely within that same block. The second assignment is a no-op.🔧 Suggested cleanup
this.state.speechStats.push({ startTime: this.state.speechStartTime, endTime: currentTime, duration: speechDuration, avgEnergy, energyIntegral: avgEnergy * speechDuration }); this.state.cachedSpeechSummary = null; if (this.state.speechStats.length > this.options.maxHistoryLength) { this.state.speechStats.shift(); - this.state.cachedSpeechSummary = null; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/audio/AudioSegmentProcessor.ts` around lines 269 - 273, Remove the redundant null assignment to this.state.cachedSpeechSummary inside the branch that shifts this.state.speechStats: you already set this.state.cachedSpeechSummary = null unconditionally before the if, so delete the second assignment within the if block (in AudioSegmentProcessor where this.state.speechStats is compared to this.options.maxHistoryLength) to eliminate the no-op.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/audio/AudioSegmentProcessor.ts`:
- Around line 474-483: The cache hit path assigns the cached object by reference
(this.state.cachedSpeechSummary) to stats.speech, allowing external callers of
getStats() to mutate the cached object; instead, make a shallow copy when
reading the cache so callers get an independent object. In updateStats() (or the
block where stats.speech is set) replace the direct assignment of
this.state.cachedSpeechSummary with a shallow copy (e.g. create a new object
literal copying avgDuration, avgEnergy, avgEnergyIntegral), or alternatively
ensure getStats() returns a shallow copy of this.state.currentStats so consumers
cannot mutate the internal cachedSpeechSummary; update references to
this.state.currentStats and this.state.cachedSpeechSummary accordingly.
---
Nitpick comments:
In `@src/lib/audio/AudioSegmentProcessor.ts`:
- Around line 466-472: The silenceStats branch still creates three temporary
arrays via .map on every update; mirror the speechStats optimization by adding a
cachedSilenceSummary accumulator (similar to cachedSpeechSummary) updated inside
updateStats(), and replace the current stats.silence calculation to compute
avgDuration, avgEnergy and avgEnergyIntegral from that cachedSilenceSummary
instead of mapping silenceStats each time; update references to
this.state.silenceStats, stats.silence, cachedSilenceSummary and the
average(...) helper so the silence path no longer allocates the three
intermediate arrays.
- Around line 269-273: Remove the redundant null assignment to
this.state.cachedSpeechSummary inside the branch that shifts
this.state.speechStats: you already set this.state.cachedSpeechSummary = null
unconditionally before the if, so delete the second assignment within the if
block (in AudioSegmentProcessor where this.state.speechStats is compared to
this.options.maxHistoryLength) to eliminate the no-op.
| if (this.state.cachedSpeechSummary) { | ||
| stats.speech = this.state.cachedSpeechSummary; | ||
| } else if (this.state.speechStats.length > 0) { | ||
| stats.speech = { | ||
| avgDuration: this.average(this.state.speechStats.map(s => s.duration)), | ||
| avgEnergy: this.average(this.state.speechStats.map(s => s.avgEnergy)), | ||
| avgEnergyIntegral: this.average(this.state.speechStats.map(s => s.energyIntegral)) | ||
| }; | ||
| this.state.cachedSpeechSummary = stats.speech; | ||
| } |
There was a problem hiding this comment.
Cache hit path stores a shared object reference — consumer mutation now silently corrupts the cache.
Line 475 assigns the cached reference directly to stats.speech. After line 485 (this.state.currentStats = stats), the invariant this.state.currentStats.speech === this.state.cachedSpeechSummary holds. getStats() returns this.state.currentStats by reference (line 497), so any consumer that writes to the returned speech object (e.g. getStats().speech.avgDuration = x) mutates the cache in place.
Before this PR, stats.speech was always a fresh object literal on every updateStats() call, so external mutations were overwritten on the very next chunk (~80 ms). After this PR, the same mutation persists until the next speech-end event — potentially several seconds — and is then served as the cache hit value.
The minimal fix is to copy on the read side, which is a cheap struct copy on the hot path:
🛡️ Proposed fix
- if (this.state.cachedSpeechSummary) {
- stats.speech = this.state.cachedSpeechSummary;
- } else if (this.state.speechStats.length > 0) {
+ if (this.state.cachedSpeechSummary !== null) {
+ stats.speech = { ...this.state.cachedSpeechSummary };
+ } else if (this.state.speechStats.length > 0) {
stats.speech = {
avgDuration: this.average(this.state.speechStats.map(s => s.duration)),
avgEnergy: this.average(this.state.speechStats.map(s => s.avgEnergy)),
avgEnergyIntegral: this.average(this.state.speechStats.map(s => s.energyIntegral))
};
this.state.cachedSpeechSummary = stats.speech;
}Alternatively, getStats() can return a shallow copy of currentStats to establish a consistent immutability contract at the API boundary.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (this.state.cachedSpeechSummary) { | |
| stats.speech = this.state.cachedSpeechSummary; | |
| } else if (this.state.speechStats.length > 0) { | |
| stats.speech = { | |
| avgDuration: this.average(this.state.speechStats.map(s => s.duration)), | |
| avgEnergy: this.average(this.state.speechStats.map(s => s.avgEnergy)), | |
| avgEnergyIntegral: this.average(this.state.speechStats.map(s => s.energyIntegral)) | |
| }; | |
| this.state.cachedSpeechSummary = stats.speech; | |
| } | |
| if (this.state.cachedSpeechSummary !== null) { | |
| stats.speech = { ...this.state.cachedSpeechSummary }; | |
| } else if (this.state.speechStats.length > 0) { | |
| stats.speech = { | |
| avgDuration: this.average(this.state.speechStats.map(s => s.duration)), | |
| avgEnergy: this.average(this.state.speechStats.map(s => s.avgEnergy)), | |
| avgEnergyIntegral: this.average(this.state.speechStats.map(s => s.energyIntegral)) | |
| }; | |
| this.state.cachedSpeechSummary = stats.speech; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/audio/AudioSegmentProcessor.ts` around lines 474 - 483, The cache hit
path assigns the cached object by reference (this.state.cachedSpeechSummary) to
stats.speech, allowing external callers of getStats() to mutate the cached
object; instead, make a shallow copy when reading the cache so callers get an
independent object. In updateStats() (or the block where stats.speech is set)
replace the direct assignment of this.state.cachedSpeechSummary with a shallow
copy (e.g. create a new object literal copying avgDuration, avgEnergy,
avgEnergyIntegral), or alternatively ensure getStats() returns a shallow copy of
this.state.currentStats so consumers cannot mutate the internal
cachedSpeechSummary; update references to this.state.currentStats and
this.state.cachedSpeechSummary accordingly.
Performance Optimization in
AudioSegmentProcessorupdateStatsmethod was recalculating averages (duration, energy, integral) for the entire speech history buffer every time a new audio chunk was processed (~12.5Hz). This involved creating multiple temporary arrays via.map()and reducing them, despite the underlying data changing only when a speech segment ended.cachedSpeechSummary) inProcessorState.null) only whenspeechStatsis modified (push/shift).updateStatsnow uses the cached value if available, or calculates and caches it if not.processAudioDataloop by ~48% in a synthetic benchmark (100k chunks: 585ms -> 303ms).bun test src/lib/audio/AudioSegmentProcessor.test.ts(all passed).repro_perf.tsto measure the speedup.PR created automatically by Jules for task 16904760239692404253 started by @ysdede
Summary by Sourcery
Cache aggregated speech statistics in AudioSegmentProcessor to avoid recomputing them on every audio chunk and improve processing performance.
Enhancements:
Summary by CodeRabbit