Refactor AudioSegmentProcessor: reduce complexity in findSpeechStart#87
Refactor AudioSegmentProcessor: reduce complexity in findSpeechStart#87
Conversation
… methods This change extracts the complex heuristic logic in `findSpeechStart` into three private helper methods: `findLastSpeechChunkIndex`, `findRisingEnergyStart`, and `findSnrCrossingStart`. This improves readability and maintainability of the speech onset detection logic without altering its behavior. - Extracted `findLastSpeechChunkIndex` to locate the most recent speech chunk. - Extracted `findRisingEnergyStart` to handle rising energy trend analysis. - Extracted `findSnrCrossingStart` to handle SNR threshold crossing checks. - Updated `findSpeechStart` to compose these helpers. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
… methods This change extracts the complex heuristic logic in `findSpeechStart` into three private helper methods: `findLastSpeechChunkIndex`, `findRisingEnergyStart`, and `findSnrCrossingStart`. This improves readability and maintainability of the speech onset detection logic without altering its behavior. - Extracted `findLastSpeechChunkIndex` to locate the most recent speech chunk. - Extracted `findRisingEnergyStart` to handle rising energy trend analysis. - Extracted `findSnrCrossingStart` to handle SNR threshold crossing checks. - Updated `findSpeechStart` to compose these helpers. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/lib/audio/AudioSegmentProcessor.ts`:
- Around line 447-458: In findSnrCrossingStart, handle the edge-case where the
chunk at startIndex is already below minSnrThreshold: instead of returning
Math.min(chunks.length - 1, i + 1) blindly, add an explicit branch so that if i
=== startIndex you return startIndex (or otherwise clamp so you never return an
index past the last valid speech chunk); keep the existing behavior for earlier
i (return i+1) and add a short comment explaining this special-case to prevent
regressions—this involves editing the findSnrCrossingStart function and touching
the use of chunks, i, startIndex and minSnrThreshold.
- Around line 392-400: findLastSpeechChunkIndex currently returns 0 to indicate
"not found", which conflicts with a real match at index 0; change
findLastSpeechChunkIndex to return -1 when no speech chunk is found and update
findSpeechStart to handle -1 explicitly (e.g., treat as "no speech" and avoid
using recentChunks[0]). Also search for any other callers (e.g.,
processAudioData / findSpeechStart invocation sites) and ensure they properly
handle the -1 sentinel (or switch to returning null if you prefer), adding a
brief comment in both functions documenting the sentinel semantics.
- Around line 373-375: The variable firstSpeechIndex is misnamed because
findLastSpeechChunkIndex() returns the most recent (last) speech chunk; rename
the variable to lastSpeechIndex and update any usages accordingly (e.g., where
passed into findRisingEnergyStart) and also rename the helper parameter
startIndex to lastIndex (or similar) in methods like findRisingEnergyStart and
any other helper functions that accept that value so names reflect "last/recent"
semantics; update all call sites and JSDoc/comments referencing
firstSpeechIndex/startIndex to use the new names (findLastSpeechChunkIndex,
lastSpeechIndex, lastIndex).
📜 Review details
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
src/lib/audio/AudioSegmentProcessor.ts
🧰 Additional context used
🧬 Code graph analysis (1)
src/lib/audio/AudioSegmentProcessor.ts (1)
src/lib/audio/audioParams.ts (1)
minSnrThreshold(133-133)
🔇 Additional comments (2)
src/lib/audio/AudioSegmentProcessor.ts (2)
372-387: Clean decomposition offindSpeechStart— nice refactor.The three-step cascade (rising energy → SNR crossing → default lookback) is clear and easy to follow. Good separation of concerns.
One note: since
findSpeechStartcan no longer return-1(minimum is0viaMath.max), the guard at line 217 (realStartIndex !== -1) is now dead code. Consider updating the caller to remove that check for consistency.
405-442:findRisingEnergyStartlogic looks correct.The backward scan, energy-rise detection, SNR floor guard, and ~500 ms lookback cap are all well-structured. Logging only fires when a trend is actually found, which is good.
One minor observation: the guard
i < chunks.length - 1at line 414 is always true in the current call context (sincei <= startIndex - 1andstartIndex <= chunks.length - 1). It's harmless as a safety check but could be removed or commented to reduce reader confusion.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| const firstSpeechIndex = this.findLastSpeechChunkIndex(); | ||
|
|
||
| const risingEnergyIndex = this.findRisingEnergyStart(firstSpeechIndex); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Misleading variable name: firstSpeechIndex holds the last (most recent) speech chunk index.
findLastSpeechChunkIndex() scans backwards and returns the most recent speech chunk, yet its result is stored in firstSpeechIndex. This is confusing and inverts the semantic meaning. Rename to lastSpeechIndex (and correspondingly the startIndex parameter in the helpers) to match what findLastSpeechChunkIndex actually returns.
✏️ Suggested rename
- const firstSpeechIndex = this.findLastSpeechChunkIndex();
+ const lastSpeechIndex = this.findLastSpeechChunkIndex();
- const risingEnergyIndex = this.findRisingEnergyStart(firstSpeechIndex);
+ const risingEnergyIndex = this.findRisingEnergyStart(lastSpeechIndex);
if (risingEnergyIndex !== null) {
return risingEnergyIndex;
}
- const snrCrossingIndex = this.findSnrCrossingStart(firstSpeechIndex);
+ const snrCrossingIndex = this.findSnrCrossingStart(lastSpeechIndex);
if (snrCrossingIndex !== null) {
return snrCrossingIndex;
}
// Default lookback
- return Math.max(0, firstSpeechIndex - 4);
+ return Math.max(0, lastSpeechIndex - 4);🤖 Prompt for AI Agents
In `@src/lib/audio/AudioSegmentProcessor.ts` around lines 373 - 375, The variable
firstSpeechIndex is misnamed because findLastSpeechChunkIndex() returns the most
recent (last) speech chunk; rename the variable to lastSpeechIndex and update
any usages accordingly (e.g., where passed into findRisingEnergyStart) and also
rename the helper parameter startIndex to lastIndex (or similar) in methods like
findRisingEnergyStart and any other helper functions that accept that value so
names reflect "last/recent" semantics; update all call sites and JSDoc/comments
referencing firstSpeechIndex/startIndex to use the new names
(findLastSpeechChunkIndex, lastSpeechIndex, lastIndex).
| private findLastSpeechChunkIndex(): number { | ||
| const chunks = this.state.recentChunks; | ||
| for (let i = chunks.length - 1; i >= 0; i--) { | ||
| if (chunks[i].isSpeech) { | ||
| firstSpeechIndex = i; | ||
| break; | ||
| return i; | ||
| } | ||
| } | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
findLastSpeechChunkIndex returns 0 as a "not found" sentinel — indistinguishable from a real match at index 0.
Returning 0 when no speech chunk exists conflates "the oldest chunk in the buffer is speech" with "no speech was found." Currently this is safe because processAudioData pushes a speech chunk into recentChunks (line 182-187) before calling findSpeechStart, so the search always succeeds. However, this is fragile — any future caller or reordering would silently degrade to using a stale chunk at index 0.
Consider returning -1 (or null) for "not found" and handling it explicitly in findSpeechStart, or at minimum adding a comment explaining why 0 is acceptable here.
♻️ Safer sentinel approach
private findLastSpeechChunkIndex(): number {
const chunks = this.state.recentChunks;
for (let i = chunks.length - 1; i >= 0; i--) {
if (chunks[i].isSpeech) {
return i;
}
}
- return 0;
+ return -1;
}Then in findSpeechStart:
private findSpeechStart(): number {
const lastSpeechIndex = this.findLastSpeechChunkIndex();
+ if (lastSpeechIndex < 0) {
+ return Math.max(0, this.state.recentChunks.length - 1);
+ }
const risingEnergyIndex = this.findRisingEnergyStart(lastSpeechIndex);🤖 Prompt for AI Agents
In `@src/lib/audio/AudioSegmentProcessor.ts` around lines 392 - 400,
findLastSpeechChunkIndex currently returns 0 to indicate "not found", which
conflicts with a real match at index 0; change findLastSpeechChunkIndex to
return -1 when no speech chunk is found and update findSpeechStart to handle -1
explicitly (e.g., treat as "no speech" and avoid using recentChunks[0]). Also
search for any other callers (e.g., processAudioData / findSpeechStart
invocation sites) and ensure they properly handle the -1 sentinel (or switch to
returning null if you prefer), adding a brief comment in both functions
documenting the sentinel semantics.
| private findSnrCrossingStart(startIndex: number): number | null { | ||
| const chunks = this.state.recentChunks; | ||
| const minSnrThreshold = this.options.minSnrThreshold; | ||
|
|
||
| for (let i = startIndex; i >= 0; i--) { | ||
| if (chunks[i].snr < minSnrThreshold) { | ||
| return Math.min(chunks.length - 1, i + 1); | ||
| } | ||
| } | ||
|
|
||
| // Default lookback | ||
| return Math.max(0, firstSpeechIndex - 4); | ||
| return null; | ||
| } |
There was a problem hiding this comment.
findSnrCrossingStart — verify intent when the chunk at startIndex is below threshold.
If chunks[startIndex].snr < minSnrThreshold, the method returns Math.min(chunks.length - 1, startIndex + 1). In the current call flow startIndex equals chunks.length - 1, so Math.min clamps the result back to startIndex — effectively a no-op. This is fine today.
However, if startIndex were ever less than chunks.length - 1, the function would return an index after the last speech chunk, which is semantically backwards for a "find start" helper. Consider adding an explicit early return or comment clarifying this edge-case behaviour to prevent subtle regressions if the callers evolve.
🤖 Prompt for AI Agents
In `@src/lib/audio/AudioSegmentProcessor.ts` around lines 447 - 458, In
findSnrCrossingStart, handle the edge-case where the chunk at startIndex is
already below minSnrThreshold: instead of returning Math.min(chunks.length - 1,
i + 1) blindly, add an explicit branch so that if i === startIndex you return
startIndex (or otherwise clamp so you never return an index past the last valid
speech chunk); keep the existing behavior for earlier i (return i+1) and add a
short comment explaining this special-case to prevent regressions—this involves
editing the findSnrCrossingStart function and touching the use of chunks, i,
startIndex and minSnrThreshold.
Review Summary (Post r1→master merge)Status: ✅ MERGEABLE Changes: Decomposes the complex \indSpeechStart()\ method in \AudioSegmentProcessor\ into three focused helpers:
Each returns Assessment: Clean extract-method refactor. The original was a single 60-line method with mixed concerns. Now each strategy (energy trend, SNR crossing, default lookback) is isolated and testable. Note: Complements PR #85 (which refactors \processAudioData). They touch different methods in the same file, so merge #85 first, then #87. Recommendation: ✅ Merge (after #85) — Completes the AudioSegmentProcessor decomposition |
Acknowledged. |
… methods This change extracts the complex heuristic logic in `findSpeechStart` into three private helper methods: `findLastSpeechChunkIndex`, `findRisingEnergyStart`, and `findSnrCrossingStart`. This improves readability and maintainability of the speech onset detection logic without altering its behavior. - Extracted `findLastSpeechChunkIndex` to locate the most recent speech chunk. - Extracted `findRisingEnergyStart` to handle rising energy trend analysis. - Extracted `findSnrCrossingStart` to handle SNR threshold crossing checks. - Updated `findSpeechStart` to compose these helpers. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Refactored
findSpeechStartinsrc/lib/audio/AudioSegmentProcessor.tsto reduce complexity.Verified with reproduction test
src/lib/audio/AudioSegmentProcessor_repro.test.ts(deleted after verification) and full test suite.PR created automatically by Jules for task 6181472373587751073 started by @ysdede
Summary by CodeRabbit
Bug Fixes
Refactor