Skip to content

Comments

feat: add wasm thread control and runtime backend/quantization/stride settings#175

Merged
AAlp22 merged 2 commits intomasterfrom
feat/wasm-thread-setting
Feb 21, 2026
Merged

feat: add wasm thread control and runtime backend/quantization/stride settings#175
AAlp22 merged 2 commits intomasterfrom
feat/wasm-thread-setting

Conversation

@ysdede
Copy link
Owner

@ysdede ysdede commented Feb 21, 2026

Summary

  • add configurable WASM thread count setting in UI and persist it
  • pass cpu thread count through ModelManager to ParakeetModel loading
  • expose runtime backend selection and encoder/decoder quantization choices
  • expose decoder frame stride setting and wire it through inference path

Notes

  • keeps fragile v4 interval/window-cap logic untouched
  • intended to reduce laptop CPU/fan pressure while preserving real-time headroom

Summary by Sourcery

Introduce user-configurable model runtime controls for backend, quantization, decoder stride, and WASM thread count, and plumb these settings through the UI, persistence, worker, and model loading pipeline.

New Features:

  • Add UI controls and app store state for selecting model backend mode, encoder/decoder quantization, decoder frame stride, and WASM thread count.
  • Persist model backend, quantization options, frame stride, and WASM thread settings across sessions via settings storage.
  • Allow model initialization (remote and local) to accept CPU thread count, backend mode, and quantization options, forwarding them to the model manager and underlying Parakeet runtime.

Enhancements:

  • Improve backend resolution logic to honor an explicit WebGPU/WASM mode while gracefully falling back to WASM when WebGPU is unavailable.
  • Adjust model asset selection to respect encoder/decoder quantization choices while keeping WebGPU encoder on fp32 when needed, and report the effective runtime backend in model progress.
  • Update default WASM thread selection based on detected hardware concurrency and clamp user thread settings to device limits.
  • Align quantization presets automatically with the selected backend mode to mirror parakeet.js demo behavior.
  • Expose decoder frame stride through the worker interface and transcription pipeline to control decoder step granularity.

Build:

  • Change the Vite dev server defaults to run on localhost:5173 instead of 0.0.0.0:3100.

Tests:

  • Extend settings storage tests to cover sanitization and persistence of frame stride, WASM thread count, and model backend/quantization fields.

Summary by CodeRabbit

  • New Features

    • Added backend selection control (WebGPU vs WASM) in settings
    • Added frame stride adjustment (1–4) for transcription performance tuning
    • Added quantization selectors for encoder and decoder models
    • Added WASM thread configuration with automatic hardware detection
    • Enhanced model settings persistence to save backend and quantization preferences
  • Chores

    • Updated development server configuration

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 21, 2026

Reviewer's Guide

Adds configurable runtime controls for Parakeet’s backend, quantization, decoder stride, and WASM thread count, plumbs them from UI through app store, persistence, worker client, and ModelManager into the transcription worker, and refactors backend/asset resolution accordingly.

Sequence diagram for model initialization with backend, quantization, and WASM threads

sequenceDiagram
    actor User
    participant SettingsPanel
    participant App
    participant AppStore
    participant TranscriptionWorkerClient as WorkerClient
    participant WorkerThread as Worker
    participant ModelManager
    participant ParakeetModel

    User->>SettingsPanel: Click Load / Reload
    SettingsPanel->>App: onLoadModel()
    App->>AppStore: read modelBackendMode, wasmThreads, encoderQuant, decoderQuant

    App->>WorkerClient: initModel(options)
    activate WorkerClient
    WorkerClient->>WorkerClient: normalizeCpuThreads(cpuThreads)
    WorkerClient->>Worker: postMessage INIT_MODEL(options)
    deactivate WorkerClient

    activate Worker
    Worker->>ModelManager: loadModel(config)
    activate ModelManager
    ModelManager->>ModelManager: _normalizeCpuThreads(cpuThreads)
    ModelManager->>ModelManager: _normalizeRequestedBackend(backend)
    ModelManager->>ModelManager: _normalizeQuantization(encoderQuant)
    ModelManager->>ModelManager: _normalizeQuantization(decoderQuant)

    ModelManager->>ModelManager: _resolveBackend(requestedBackend)
    ModelManager->>App: onProgress(backend, effectiveBackend)

    ModelManager->>ParakeetModel: fromUrls({backend, cpuThreads, urls})
    activate ParakeetModel
    ParakeetModel-->>ModelManager: model instance
    deactivate ParakeetModel

    ModelManager-->>Worker: model ready
    deactivate ModelManager

    Worker-->>App: INIT_MODEL_DONE
    deactivate Worker
    App->>AppStore: setModelState(ready)
    App->>AppStore: setBackend(runtimeBackend)
Loading

Updated class diagram for ModelManager, TranscriptionWorkerClient, and configuration types

classDiagram
    class ModelManager {
      -backend : BackendType
      +loadModel(config) void
      +loadLocalModel(files, options) void
      -_buildDirectModelAssets(modelId, backend, encoderQuant, decoderQuant, getModelConfig) ResolvedModelAssets
      -_normalizeCpuThreads(value) number
      -_normalizeRequestedBackend(value) ModelBackendMode
      -_normalizeQuantization(value, fallback) QuantizationMode
      -_resolveBackend(requestedBackend) Promise~ResolveBackendResult~
    }

    class ResolveBackendResult {
      +effectiveBackend : ModelBackendMode
      +runtimeBackend : BackendType
    }

    class TranscriptionWorkerClient {
      -worker : Worker
      +initModel(options) Promise~void~
      +initLocalModel(files, options) Promise~void~
      +initService(config) Promise~void~
      +initV3Service(config) Promise~void~
      +sendRequest(type, payload) Promise~any~
      -normalizeCpuThreads(cpuThreads) number
    }

    class InitModelOptions {
      +modelId : string
      +cpuThreads : number
      +backend : ModelBackendMode
      +encoderQuant : QuantizationMode
      +decoderQuant : QuantizationMode
    }

    class InitLocalModelOptions {
      +cpuThreads : number
      +backend : ModelBackendMode
    }

    class ModelConfig {
      +modelId : string
      +backend : ModelBackendMode
      +cpuThreads : number
      +encoderQuant : QuantizationMode
      +decoderQuant : QuantizationMode
    }

    class ModelProgress {
      +stage : string
      +progress : number
      +message : string
      +file : string
      +backend : BackendType
    }

    class PersistedSettings {
      +general : PersistedGeneralSettings
      +model : PersistedModelSettings
      +audio : PersistedAudioSettings
      +ui : PersistedUiSettings
    }

    class PersistedGeneralSettings {
      +v4InferenceIntervalMs : number
      +v4SilenceFlushSec : number
      +streamingWindow : number
      +frameStride : number
      +wasmThreads : number
    }

    class PersistedModelSettings {
      +selectedModelId : string
      +backend : ModelBackendMode
      +encoderQuant : QuantizationMode
      +decoderQuant : QuantizationMode
    }

    class AppStore {
      +modelBackendMode() ModelBackendMode
      +encoderQuant() QuantizationMode
      +decoderQuant() QuantizationMode
      +frameStride() number
      +wasmThreads() number
      +setModelBackendMode(mode) void
      +setEncoderQuant(mode) void
      +setDecoderQuant(mode) void
      +setFrameStride(value) void
      +setWasmThreads(value) void
    }

    class Types {
      <<enumeration>> BackendType
      webgpu
      wasm
    }

    class ModelBackendModeEnum {
      <<enumeration>> ModelBackendMode
      webgpu-hybrid
      wasm
    }

    class QuantizationModeEnum {
      <<enumeration>> QuantizationMode
      int8
      fp32
    }

    TranscriptionWorkerClient --> InitModelOptions : uses
    TranscriptionWorkerClient --> InitLocalModelOptions : uses
    TranscriptionWorkerClient --> ModelConfig : configures
    ModelManager --> ModelConfig : uses
    ModelManager --> ModelProgress : emits
    PersistedSettings --> PersistedGeneralSettings : has
    PersistedSettings --> PersistedModelSettings : has
    AppStore --> PersistedModelSettings : persists
    AppStore --> PersistedGeneralSettings : persists
    ModelManager --> Types : runtimeBackend
    ModelManager --> ModelBackendModeEnum : effectiveBackend
    ModelManager --> QuantizationModeEnum : quantization
    AppStore --> ModelBackendModeEnum : selected
    AppStore --> QuantizationModeEnum : selected
Loading

File-Level Changes

Change Details Files
Thread count and backend/quantization options are propagated through ModelManager and worker to Parakeet model loading.
  • Extend ModelManager.loadModel and loadLocalModel to accept cpuThreads, backend mode, and quantization options and pass them into ParakeetModel.fromUrls and asset resolution.
  • Introduce helpers to normalize CPU thread count, map user backend modes to runtime backends, and validate quantization values with WebGPU capability resolution.
  • Update transcription worker message handling and client API to send structured INIT_MODEL and LOAD_LOCAL_MODEL payloads including cpuThreads, backend, quantization, and frameStride into ModelManager and parakeet inference.
src/lib/transcription/ModelManager.ts
src/lib/transcription/transcription.worker.ts
src/lib/transcription/TranscriptionWorkerClient.ts
src/lib/transcription/types.ts
New UI controls and app store state are added for backend mode, encoder/decoder quantization, decoder frame stride, and WASM threads, with persistence to local storage.
  • Add settings panel controls for backend selection, stride numeric input, encoder/decoder quant dropdowns, and a WASM threads slider with clamping to device hardware concurrency.
  • Extend app store with modelBackendMode, encoderQuant, decoderQuant, frameStride, and wasmThreads signals plus defaults based on hardware, and wire these into model loading calls.
  • Persist and restore new model/general settings (backend, quantization, frameStride, wasmThreads) via settingsStorage with sanitization and corresponding tests, and keep quantization presets in sync with selected backend mode.
src/components/SettingsPanel.tsx
src/stores/appStore.ts
src/utils/settingsStorage.ts
src/utils/settingsStorage.test.ts
src/App.tsx
Model asset selection is made backend/quantization-aware and server config is adjusted for local development.
  • Refactor direct asset resolution to choose encoder/decoder ONNX filenames based on ModelBackendMode and encoder/decoder quantization, including forcing fp32 encoder with WebGPU when int8 is requested.
  • Surface the resolved runtime backend in ModelProgress so the UI can display which backend is actually used.
  • Update Vite dev server config to use localhost:5173 instead of 0.0.0.0:3100.
src/lib/transcription/ModelManager.ts
src/lib/transcription/types.ts
vite.config.js

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

Warning

Rate limit exceeded

@ysdede has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 30 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive configuration system for model backends (WebGPU vs WASM) and quantization modes (int8 vs fp32), extending the persistence/hydration layer, worker communication protocol, and model manager with backend resolution and thread normalization logic. UI controls are added to the settings panel for user configuration.

Changes

Cohort / File(s) Summary
Type Definitions
src/lib/transcription/types.ts
Added new public types: ModelBackendMode ('webgpu-hybrid' | 'wasm') and QuantizationMode ('int8' | 'fp32'). Updated ModelConfig to use ModelBackendMode and added cpuThreads, encoderQuant, decoderQuant fields. Added backend field to ModelProgress.
App Store & State Management
src/stores/appStore.ts
Added hardware thread detection and new reactive signals: modelBackendMode, encoderQuant, decoderQuant, and wasmThreads. Derived default thread count from hardware concurrency with fallback default of 4.
Settings Persistence
src/utils/settingsStorage.ts, src/utils/settingsStorage.test.ts
Extended PersistedSettings schema to include frameStride, wasmThreads in general section, and backend, encoderQuant, decoderQuant in model section. Added validation helpers and updated sanitization logic to handle new fields with clamping behavior.
Main App Component
src/App.tsx
Added thread utility helpers (getMaxHardwareThreads, clampWasmThreadsForDevice). Enhanced model loading to accept structured config with backend/quantization/thread parameters. Updated persistence read/write to include new fields. Added reactive synchronization of quantization modes with backend selection.
Worker Communication Layer
src/lib/transcription/TranscriptionWorkerClient.ts, src/lib/transcription/transcription.worker.ts
Added new option interfaces InitModelOptions and InitLocalModelOptions. Updated INIT_MODEL and LOAD_LOCAL_MODEL payloads to include backend, quantization, and thread parameters. Added frameStride to PROCESS_V4_CHUNK_WITH_FEATURES payload. Refactored message handlers to pass structured options.
Model Manager
src/lib/transcription/ModelManager.ts
Expanded loadModel and loadLocalModel signatures to accept backend, quantization, and thread config. Introduced backend resolution logic that determines effective and runtime backends with WebGPU fallback. Added private normalization helpers for CPU threads and quantization. Updated asset loading to use resolved backend and quantization-aware ONNX file selection.
Settings UI
src/components/SettingsPanel.tsx
Added UI controls: backend selector (WebGPU vs WASM), frame stride input (1–4), encoder quantization selector (fp32/int8), decoder quantization selector (int8/fp32), and WASM threads range control. Updated load button states and labels. Integrated getMaxHardwareThreads() for thread validation.
Dev Configuration
vite.config.js
Updated dev server configuration from port 3100/host 0.0.0.0 to port 5173/host localhost.

Sequence Diagram

sequenceDiagram
    actor User
    participant SettingsPanel
    participant AppStore
    participant AppComponent
    participant TranscriptionWorkerClient as WorkerClient
    participant ModelManager
    participant ParakeetModel

    User->>SettingsPanel: Select backend & quantization
    SettingsPanel->>AppStore: setModelBackendMode(), setEncoderQuant(), setDecoderQuant()
    AppStore->>AppComponent: Emit state changes
    AppComponent->>AppComponent: Sync quantization with backend (reactive effect)
    AppComponent->>AppComponent: Build model config
    User->>SettingsPanel: Click Load/Reload
    SettingsPanel->>AppComponent: Trigger loadSelectedModel()
    AppComponent->>WorkerClient: initModel({ modelId, backend, encoderQuant, decoderQuant, cpuThreads })
    WorkerClient->>ModelManager: loadModel(config)
    ModelManager->>ModelManager: _resolveBackend(requestedBackend)
    Note over ModelManager: Determine effectiveBackend & runtimeBackend<br/>(with WebGPU fallback)
    ModelManager->>ModelManager: _normalizeQuantization(encoderQuant, decoderQuant)
    ModelManager->>ModelManager: Build model assets with quantization-aware<br/>ONNX filenames
    ModelManager->>ParakeetModel: Create with resolved backend & config
    ParakeetModel-->>ModelManager: Ready
    ModelManager-->>WorkerClient: onModelProgress({ backend, ... })
    WorkerClient-->>AppComponent: Forward progress to app store
    AppComponent->>AppStore: Persist settings including backend/quantization/threads
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Poem

🐰 With threads and backends now in view,
And quantization options too,
Settings persist through storage's grace,
While workers harmonize at pace—
One config to orchestrate them all!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main changes: adding WASM thread control and exposing runtime backend/quantization/stride settings across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/wasm-thread-setting

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The getMaxHardwareThreads / hardware thread detection and WASM thread clamping logic is now duplicated across SettingsPanel.tsx, App.tsx, and appStore.ts; consider extracting this into a shared utility to keep behavior consistent and easier to maintain.
  • There is now both BackendType ('webgpu' | 'wasm') and ModelBackendMode ('webgpu-hybrid' | 'wasm') plus _resolveBackend returning both effectiveBackend and runtimeBackend; adding a small helper or clearer naming to distinguish user-facing vs runtime backend values would reduce confusion and the risk of passing the wrong one into consumers like ParakeetModel.fromUrls or _buildDirectModelAssets.
  • CPU thread normalization is implemented separately in ModelManager._normalizeCpuThreads and TranscriptionWorkerClient.normalizeCpuThreads; centralizing this logic would ensure identical clamping behavior between main thread and worker calls.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `getMaxHardwareThreads` / hardware thread detection and WASM thread clamping logic is now duplicated across `SettingsPanel.tsx`, `App.tsx`, and `appStore.ts`; consider extracting this into a shared utility to keep behavior consistent and easier to maintain.
- There is now both `BackendType` (`'webgpu' | 'wasm'`) and `ModelBackendMode` (`'webgpu-hybrid' | 'wasm'`) plus `_resolveBackend` returning both `effectiveBackend` and `runtimeBackend`; adding a small helper or clearer naming to distinguish user-facing vs runtime backend values would reduce confusion and the risk of passing the wrong one into consumers like `ParakeetModel.fromUrls` or `_buildDirectModelAssets`.
- CPU thread normalization is implemented separately in `ModelManager._normalizeCpuThreads` and `TranscriptionWorkerClient.normalizeCpuThreads`; centralizing this logic would ensure identical clamping behavior between main thread and worker calls.

## Individual Comments

### Comment 1
<location> `src/components/SettingsPanel.tsx:11-15` </location>
<code_context>
   return `${ms}ms`;
 };

+const getMaxHardwareThreads = () => {
+  if (typeof navigator === 'undefined' || !Number.isFinite(navigator.hardwareConcurrency)) {
+    return 4;
+  }
+  return Math.max(1, Math.floor(navigator.hardwareConcurrency));
+};
+
</code_context>

<issue_to_address>
**suggestion:** Consider centralizing the hardware thread detection logic instead of duplicating it in multiple places.

This `getMaxHardwareThreads` logic now appears here, in `App.tsx`, and `appStore` has similar `hardwareThreads`/`defaultWasmThreads` logic. Extracting a shared utility (e.g. `utils/hardware.ts`) would prevent divergence if the heuristics change and keep related tests aligned.

Suggested implementation:

```typescript
  return `${ms}ms`;
};


/** Visible section preset for the embeddable settings content. */
export type SettingsPanelSection = 'full' | 'audio' | 'model';

```

To fully implement the centralization you suggested, you’ll also want to:

1. Create a shared utility, e.g. `src/utils/hardware.ts`:
   - Export a `getMaxHardwareThreads` function containing the logic that was just removed:
     ```ts
     export const getMaxHardwareThreads = (): number => {
       if (typeof navigator === 'undefined' || !Number.isFinite(navigator.hardwareConcurrency)) {
         return 4;
       }
       return Math.max(1, Math.floor(navigator.hardwareConcurrency));
     };
     ```
2. In `src/components/SettingsPanel.tsx`:
   - Add an import near the top of the file:
     ```ts
     import { getMaxHardwareThreads } from '../utils/hardware';
     ```
   - Update any usage of `getMaxHardwareThreads` in this file (if present below the snippet) to use the imported function (no callsite changes should be needed beyond the import).
3. In `App.tsx`:
   - Remove any inline/duplicated `getMaxHardwareThreads` logic.
   - Import and use `getMaxHardwareThreads` from `../utils/hardware` (adjust the relative path to match your project structure).
4. In `appStore` (where `hardwareThreads` / `defaultWasmThreads` logic lives):
   - Replace any custom hardware-thread detection with `getMaxHardwareThreads` from `../utils/hardware` (again, adjusting the path).
5. Add or update tests:
   - Create tests for `getMaxHardwareThreads` in a shared test file (e.g. `src/utils/__tests__/hardware.test.ts`), covering:
     - `navigator` undefined (SSR) case.
     - Non-finite `hardwareConcurrency`.
     - Typical values (e.g. 2, 4, 8).
   - Remove or update any previous tests that duplicated this logic in `App.tsx` or store tests.

These steps will ensure all parts of the app share a single, consistent heuristic for hardware thread detection.
</issue_to_address>

### Comment 2
<location> `src/lib/transcription/TranscriptionWorkerClient.ts:178-180` </location>
<code_context>
         return transferOwnership === false ? [] : [buffer as ArrayBuffer];
     }

+    private normalizeCpuThreads(cpuThreads?: number): number | undefined {
+        if (!Number.isFinite(cpuThreads)) return undefined;
+        return Math.max(1, Math.floor(cpuThreads as number));
+    }
+
</code_context>

<issue_to_address>
**suggestion:** The CPU thread normalization logic is duplicated with `ModelManager._normalizeCpuThreads`; consider sharing a single implementation.

`TranscriptionWorkerClient.normalizeCpuThreads` and `ModelManager._normalizeCpuThreads` use the same `Number.isFinite` + `Math.max(1, floor)` logic. If only one is updated in the future (e.g., adding an upper bound or device-specific cap), their behavior will diverge. Please extract this into a shared helper or common utility to keep the logic in sync.

Suggested implementation:

```typescript
         return transferOwnership === false ? [] : [buffer as ArrayBuffer];
     }


+import { normalizeCpuThreads } from './cpuThreads';
+import type {
+    ModelState,
+    ModelProgress,
+    TranscriptionResult,
+    TranscriptionServiceConfig,
+    ModelBackendMode,
+    QuantizationMode,
+} from './types';
+import { TokenStreamResult, TokenStreamConfig } from './TokenStreamTranscriber';

```

To fully implement the deduplication and keep behavior in sync:

1. Create a shared helper module, e.g. `src/lib/transcription/cpuThreads.ts`:

   - Export a function:
     ```ts
     export function normalizeCpuThreads(cpuThreads?: number): number | undefined {
         if (!Number.isFinite(cpuThreads)) return undefined;
         return Math.max(1, Math.floor(cpuThreads as number));
     }
     ```

2. Update `TranscriptionWorkerClient` (in this file) to:
   - Remove all internal calls to `this.normalizeCpuThreads(...)` (if any) and replace them with `normalizeCpuThreads(...)` from the new helper.

3. Update `ModelManager` (likely `src/lib/transcription/ModelManager.ts`):
   - Replace the `_normalizeCpuThreads` implementation with calls to the shared `normalizeCpuThreads` helper.
   - Remove the now-redundant `_normalizeCpuThreads` method from `ModelManager`.

These changes ensure that any future changes to CPU thread normalization (e.g. adding an upper bound or device-specific cap) are made in one place only.
</issue_to_address>

### Comment 3
<location> `src/App.tsx:471` </location>
<code_context>
       appStore.setModelProgress(p.progress);
       appStore.setModelMessage(p.message || '');
       if (p.file) appStore.setModelFile(p.file);
+      if (p.backend) appStore.setBackend(p.backend);
     };

</code_context>

<issue_to_address>
**issue (bug_risk):** `appStore.setBackend` is called but there is no corresponding signal/setter in the store API.

`createAppStore` defines `modelBackendMode`, `encoderQuant`, `decoderQuant`, and `wasmThreads`, but no `backend`/`setBackend` pair is exported. This call will not typecheck (or will hit an undefined property at runtime). If you need to track the resolved runtime backend separately from `modelBackendMode`, add a dedicated signal (e.g. `[runtimeBackend, setRuntimeBackend]`) to `createAppStore` and export the setter.
</issue_to_address>

### Comment 4
<location> `src/App.tsx:227` </location>
<code_context>
   };
 };

+const getMaxHardwareThreads = (): number => {
+  if (typeof navigator === 'undefined' || !Number.isFinite(navigator.hardwareConcurrency)) {
+    return 4;
</code_context>

<issue_to_address>
**issue (complexity):** Consider extracting shared thread helpers, moving backend/quantization policy into the store, and centralizing settings hydration to keep App.tsx slimmer and less coupled to configuration details.

The new functionality looks solid, but there are a couple of places where you can reduce complexity and duplication without changing behavior.

### 1. Extract thread helpers to a shared utility

`getMaxHardwareThreads` / `clampWasmThreadsForDevice` in `App.tsx` duplicate logic that already exists in `SettingsPanel.tsx`. Moving them into a shared module removes duplication and shrinks `App.tsx`.

**Example:**

```ts
// src/utils/hardwareThreads.ts
export const getMaxHardwareThreads = (): number => {
  if (typeof navigator === 'undefined' || !Number.isFinite(navigator.hardwareConcurrency)) {
    return 4;
  }
  return Math.max(1, Math.floor(navigator.hardwareConcurrency));
};

export const clampWasmThreadsForDevice = (value: number): number =>
  Math.max(1, Math.min(getMaxHardwareThreads(), Math.floor(value)));
```

Then in `App.tsx` and `SettingsPanel.tsx`:

```ts
import { clampWasmThreadsForDevice } from '@/utils/hardwareThreads';

// ...
if (persistedGeneral?.wasmThreads !== undefined) {
  appStore.setWasmThreads(clampWasmThreadsForDevice(persistedGeneral.wasmThreads));
}
```

This keeps the behavior identical, but centralizes the environment/capability logic.

### 2. Move backend/quantization policy out of `App`

The `createEffect` that enforces `encoderQuant`/`decoderQuant` presets based on `modelBackendMode` is policy logic that fits better in the store (or a configuration module) than in `App.tsx`. If you encapsulate it, `App` doesnt need to know about the quantization rules.

**Example store API:**

```ts
// in appStore.ts (or wherever the store lives)
const applyBackendPresets = (backendMode: BackendMode) => {
  if (backendMode.startsWith('webgpu')) {
    setEncoderQuant('fp32');
    setDecoderQuant('int8');
  } else {
    setEncoderQuant('int8');
    setDecoderQuant('int8');
  }
};

export const setModelBackendMode = (backendMode: BackendMode) => {
  state.modelBackendMode = backendMode;
  applyBackendPresets(backendMode);
};
```

Then in `App.tsx` you can drop the `createEffect` and just call:

```ts
if (persistedModel?.backend !== undefined) {
  appStore.setModelBackendMode(persistedModel.backend);
}
```

Any other caller that changes the backend automatically gets the same quantization policy, without extra effects in `App`.

### 3. Optional: consolidate settings hydration

Youre now restoring a growing list of fields in `App.tsx` (`energyThreshold`, `sileroThreshold`, `frameStride`, `wasmThreads`, backend/quant, etc.). As this grows, a dedicated hydration function in the store can keep `App.tsx` slimmer.

**Example:**

```ts
// in appStore.ts
export const hydrateFromPersistedSettings = (settings: PersistedSettings) => {
  const { general, audio, model, ui } = settings;

  if (model?.selectedModelId && MODELS.some(m => m.id === model.selectedModelId)) {
    setSelectedModelId(model.selectedModelId);
  }
  if (model?.backend !== undefined) setModelBackendMode(model.backend);
  if (model?.encoderQuant !== undefined) setEncoderQuant(model.encoderQuant);
  if (model?.decoderQuant !== undefined) setDecoderQuant(model.decoderQuant);

  if (general?.energyThreshold !== undefined) setEnergyThreshold(general.energyThreshold);
  if (general?.sileroThreshold !== undefined) setSileroThreshold(general.sileroThreshold);
  if (general?.v4InferenceIntervalMs !== undefined) setV4InferenceIntervalMs(general.v4InferenceIntervalMs);
  if (general?.v4SilenceFlushSec !== undefined) setV4SilenceFlushSec(general.v4SilenceFlushSec);
  if (general?.streamingWindow !== undefined) setStreamingWindow(general.streamingWindow);
  if (general?.frameStride !== undefined) setFrameStride(general.frameStride);
  if (general?.wasmThreads !== undefined) setWasmThreads(clampWasmThreadsForDevice(general.wasmThreads));

  if (ui?.debugPanel?.visible !== undefined) setShowDebugPanel(ui.debugPanel.visible);
};
```

Then in `App.tsx`:

```ts
const persistedSettings = loadSettingsFromStorage();
appStore.hydrateFromPersistedSettings(persistedSettings);
```

This keeps all current behavior but reduces the amount of cross-cutting configuration code in `App.tsx`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +178 to +180
private normalizeCpuThreads(cpuThreads?: number): number | undefined {
if (!Number.isFinite(cpuThreads)) return undefined;
return Math.max(1, Math.floor(cpuThreads as number));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The CPU thread normalization logic is duplicated with ModelManager._normalizeCpuThreads; consider sharing a single implementation.

TranscriptionWorkerClient.normalizeCpuThreads and ModelManager._normalizeCpuThreads use the same Number.isFinite + Math.max(1, floor) logic. If only one is updated in the future (e.g., adding an upper bound or device-specific cap), their behavior will diverge. Please extract this into a shared helper or common utility to keep the logic in sync.

Suggested implementation:

         return transferOwnership === false ? [] : [buffer as ArrayBuffer];
     }


+import { normalizeCpuThreads } from './cpuThreads';
+import type {
+    ModelState,
+    ModelProgress,
+    TranscriptionResult,
+    TranscriptionServiceConfig,
+    ModelBackendMode,
+    QuantizationMode,
+} from './types';
+import { TokenStreamResult, TokenStreamConfig } from './TokenStreamTranscriber';

To fully implement the deduplication and keep behavior in sync:

  1. Create a shared helper module, e.g. src/lib/transcription/cpuThreads.ts:

    • Export a function:
      export function normalizeCpuThreads(cpuThreads?: number): number | undefined {
          if (!Number.isFinite(cpuThreads)) return undefined;
          return Math.max(1, Math.floor(cpuThreads as number));
      }
  2. Update TranscriptionWorkerClient (in this file) to:

    • Remove all internal calls to this.normalizeCpuThreads(...) (if any) and replace them with normalizeCpuThreads(...) from the new helper.
  3. Update ModelManager (likely src/lib/transcription/ModelManager.ts):

    • Replace the _normalizeCpuThreads implementation with calls to the shared normalizeCpuThreads helper.
    • Remove the now-redundant _normalizeCpuThreads method from ModelManager.

These changes ensure that any future changes to CPU thread normalization (e.g. adding an upper bound or device-specific cap) are made in one place only.

appStore.setModelProgress(p.progress);
appStore.setModelMessage(p.message || '');
if (p.file) appStore.setModelFile(p.file);
if (p.backend) appStore.setBackend(p.backend);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): appStore.setBackend is called but there is no corresponding signal/setter in the store API.

createAppStore defines modelBackendMode, encoderQuant, decoderQuant, and wasmThreads, but no backend/setBackend pair is exported. This call will not type‑check (or will hit an undefined property at runtime). If you need to track the resolved runtime backend separately from modelBackendMode, add a dedicated signal (e.g. [runtimeBackend, setRuntimeBackend]) to createAppStore and export the setter.

src/App.tsx Outdated
};
};

const getMaxHardwareThreads = (): number => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting shared thread helpers, moving backend/quantization policy into the store, and centralizing settings hydration to keep App.tsx slimmer and less coupled to configuration details.

The new functionality looks solid, but there are a couple of places where you can reduce complexity and duplication without changing behavior.

1. Extract thread helpers to a shared utility

getMaxHardwareThreads / clampWasmThreadsForDevice in App.tsx duplicate logic that already exists in SettingsPanel.tsx. Moving them into a shared module removes duplication and shrinks App.tsx.

Example:

// src/utils/hardwareThreads.ts
export const getMaxHardwareThreads = (): number => {
  if (typeof navigator === 'undefined' || !Number.isFinite(navigator.hardwareConcurrency)) {
    return 4;
  }
  return Math.max(1, Math.floor(navigator.hardwareConcurrency));
};

export const clampWasmThreadsForDevice = (value: number): number =>
  Math.max(1, Math.min(getMaxHardwareThreads(), Math.floor(value)));

Then in App.tsx and SettingsPanel.tsx:

import { clampWasmThreadsForDevice } from '@/utils/hardwareThreads';

// ...
if (persistedGeneral?.wasmThreads !== undefined) {
  appStore.setWasmThreads(clampWasmThreadsForDevice(persistedGeneral.wasmThreads));
}

This keeps the behavior identical, but centralizes the environment/capability logic.

2. Move backend/quantization policy out of App

The createEffect that enforces encoderQuant/decoderQuant presets based on modelBackendMode is policy logic that fits better in the store (or a configuration module) than in App.tsx. If you encapsulate it, App doesn’t need to know about the quantization rules.

Example store API:

// in appStore.ts (or wherever the store lives)
const applyBackendPresets = (backendMode: BackendMode) => {
  if (backendMode.startsWith('webgpu')) {
    setEncoderQuant('fp32');
    setDecoderQuant('int8');
  } else {
    setEncoderQuant('int8');
    setDecoderQuant('int8');
  }
};

export const setModelBackendMode = (backendMode: BackendMode) => {
  state.modelBackendMode = backendMode;
  applyBackendPresets(backendMode);
};

Then in App.tsx you can drop the createEffect and just call:

if (persistedModel?.backend !== undefined) {
  appStore.setModelBackendMode(persistedModel.backend);
}

Any other caller that changes the backend automatically gets the same quantization policy, without extra effects in App.

3. Optional: consolidate settings hydration

You’re now restoring a growing list of fields in App.tsx (energyThreshold, sileroThreshold, frameStride, wasmThreads, backend/quant, etc.). As this grows, a dedicated hydration function in the store can keep App.tsx slimmer.

Example:

// in appStore.ts
export const hydrateFromPersistedSettings = (settings: PersistedSettings) => {
  const { general, audio, model, ui } = settings;

  if (model?.selectedModelId && MODELS.some(m => m.id === model.selectedModelId)) {
    setSelectedModelId(model.selectedModelId);
  }
  if (model?.backend !== undefined) setModelBackendMode(model.backend);
  if (model?.encoderQuant !== undefined) setEncoderQuant(model.encoderQuant);
  if (model?.decoderQuant !== undefined) setDecoderQuant(model.decoderQuant);

  if (general?.energyThreshold !== undefined) setEnergyThreshold(general.energyThreshold);
  if (general?.sileroThreshold !== undefined) setSileroThreshold(general.sileroThreshold);
  if (general?.v4InferenceIntervalMs !== undefined) setV4InferenceIntervalMs(general.v4InferenceIntervalMs);
  if (general?.v4SilenceFlushSec !== undefined) setV4SilenceFlushSec(general.v4SilenceFlushSec);
  if (general?.streamingWindow !== undefined) setStreamingWindow(general.streamingWindow);
  if (general?.frameStride !== undefined) setFrameStride(general.frameStride);
  if (general?.wasmThreads !== undefined) setWasmThreads(clampWasmThreadsForDevice(general.wasmThreads));

  if (ui?.debugPanel?.visible !== undefined) setShowDebugPanel(ui.debugPanel.visible);
};

Then in App.tsx:

const persistedSettings = loadSettingsFromStorage();
appStore.hydrateFromPersistedSettings(persistedSettings);

This keeps all current behavior but reduces the amount of cross-cutting configuration code in App.tsx.

});

// Keep quantization presets aligned with the backend mode (parakeet.js demo behavior).
createEffect(() => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: createEffect silently discards user-persisted quantization choices

This effect fires immediately on mount (and on every modelBackendMode change), which means any encoderQuant/decoderQuant values restored from localStorage at lines 251–255 are immediately overwritten. For example, a user who saved backend: 'wasm' + encoderQuant: 'fp32' will always have encoderQuant reset to 'int8' on startup.

The effect also prevents the user from independently choosing fp32 encoder on WASM — the UI dropdowns appear editable but the effect immediately reverts them.

Safest fix: Remove this effect and instead apply the preset only when the backend mode changes (not on initial mount), or only apply defaults when the user has not explicitly set a quant value. Alternatively, document that quant is always derived from backend and make the dropdowns read-only / hidden when the preset is active.

appStore.setModelProgress(p.progress);
appStore.setModelMessage(p.message || '');
if (p.file) appStore.setModelFile(p.file);
if (p.backend) appStore.setBackend(p.backend);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: appStore.setBackend updates the runtime backend signal (the actual backend used), but appStore.modelBackendMode (the requested backend) is never updated when a WebGPU fallback to WASM occurs. After a fallback, modelBackendMode() still returns 'webgpu-hybrid' while backend() returns 'wasm'. On the next save-to-storage cycle (line 402–415), backend: 'webgpu-hybrid' is persisted, so the user's setting is not corrected. Consider also updating modelBackendMode when a fallback is detected, or at minimum surfacing the mismatch in the UI.

const revision = 'main';
const encoderName = backend === 'webgpu' ? 'encoder-model.onnx' : 'encoder-model.int8.onnx';
const decoderName = 'decoder_joint-model.int8.onnx';
const resolvedEncoderQuant = backend.startsWith('webgpu') && encoderQuant === 'int8' ? 'fp32' : encoderQuant;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Behavioral inconsistency between primary and fallback asset paths for WebGPU + int8 encoder.

The fallback path (_buildDirectModelAssets) silently overrides encoderQuant from int8fp32 when backend.startsWith('webgpu') (line 424). However, the primary path via getParakeetModel at line 133–148 passes encoderQuant directly without this override. If getParakeetModel accepts int8 for WebGPU, the two paths produce different model files. If it does not, the primary path will fail or silently fall back inside parakeet.js.

The override logic should be applied consistently before both calls, or the comment should explain why the primary path is exempt.

vite.config.js Outdated
port: 3100,
host: '0.0.0.0',
port: 5173,
host: 'localhost',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Changing host from '0.0.0.0' to 'localhost' breaks Docker/container and remote dev environments where the dev server must bind to all interfaces to be reachable from the host machine or other containers. This is a deployment regression for any team member not running Vite natively on their local machine.

If the intent is to restrict local-only access for security, this should be documented and opt-in (e.g., via an env var), not a hard default change.

const loadSelectedModel = async () => {
if (!workerClient) return;
if (appStore.modelState() === 'ready') return;
if (appStore.modelState() === 'loading') return;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: The guard if (appStore.modelState() === 'ready') return; was removed to allow model reload. However, there is no guard against reloading while transcription is actively running (recordingState() === 'recording'). Reloading the model mid-transcription will tear down the worker and drop in-flight audio. Consider adding:

Suggested change
if (appStore.modelState() === 'loading') return;
if (appStore.modelState() === 'loading') return;

and separately checking appStore.recordingState() !== 'idle' before allowing reload, or at minimum showing a warning.

@kiloconnect
Copy link

kiloconnect bot commented Feb 21, 2026

Code Review Summary

Status: 4 Issues Found | Recommendation: Address before merge

Overview

This PR adds configurable runtime controls for backend mode (WebGPU/WASM), encoder/decoder quantization, decoder frame stride, and WASM thread count. The plumbing from UI → store → persistence → worker → ModelManager is well-structured and the sanitization/validation layer in settingsStorage.ts is solid. The _resolveBackend refactor cleanly separates user intent from runtime capability.

Risk: Medium — The quantization createEffect introduces a silent state override that conflicts with the persistence layer, and the vite.config.js host change is a deployment regression.

Severity Count
CRITICAL 0
WARNING 3
SUGGESTION 1
Issue Details (click to expand)

WARNING

File Line Issue
src/App.tsx 327 createEffect silently discards user-persisted quantization choices on every mount and backend change
src/App.tsx 471 modelBackendMode not updated on WebGPU→WASM fallback; stale value re-persisted on next save
src/lib/transcription/ModelManager.ts 424 Behavioral inconsistency: fallback path overrides int8fp32 for WebGPU encoder, primary path does not
vite.config.js 87 host: 'localhost' breaks Docker/container/remote dev environments

SUGGESTION

File Line Issue
src/App.tsx 1088 No guard against reloading model while transcription is actively running
Detailed Notes

1. createEffect overrides persisted quantization (WARNING — src/App.tsx:327)

The effect fires on mount and on every modelBackendMode change. Persisted encoderQuant/decoderQuant values (restored at lines 251–255) are immediately overwritten. A user who saved backend: 'wasm' + encoderQuant: 'fp32' will always have encoderQuant reset to 'int8' on startup. The UI dropdowns appear independently editable but the effect silently reverts them.

Fix options:

  • Remove the effect and apply defaults only when the user explicitly changes the backend (not on initial mount).
  • Or make the quant dropdowns read-only/hidden when the backend-derived preset is active, and document that quant is always derived from backend.
  • Or use on() with defer: true to skip the initial run.

2. modelBackendMode not updated on fallback (WARNING — src/App.tsx:471)

When WebGPU is unavailable, _resolveBackend returns effectiveBackend: 'wasm' and the runtime backend signal is updated via setBackend. But modelBackendMode (the requested backend) remains 'webgpu-hybrid'. On the next settings save, backend: 'webgpu-hybrid' is persisted, so the fallback is not remembered. The user will see a mismatch between the displayed backend mode and the actual runtime backend.

Fix: When p.backend is received and differs from appStore.modelBackendMode(), also call appStore.setModelBackendMode(p.backend === 'webgpu' ? 'webgpu-hybrid' : 'wasm') — or surface the effective backend separately in the UI.

3. Encoder quant override inconsistency in fallback path (WARNING — src/lib/transcription/ModelManager.ts:424)

The _buildDirectModelAssets fallback path forces fp32 encoder for WebGPU when encoderQuant === 'int8'. The primary path via getParakeetModel passes encoderQuant directly. If getParakeetModel does not apply the same override internally, the two paths will download different model files for the same configuration. This could cause silent quality/performance differences depending on whether the cache is warm or stale.

Fix: Apply the int8→fp32 override before calling getParakeetModel as well, or confirm that getParakeetModel handles this internally and add a comment.

4. vite.config.js host change (WARNING — vite.config.js:87)

Changing host: '0.0.0.0' to host: 'localhost' prevents the dev server from being reachable from Docker containers, VMs, or remote machines. This is a breaking change for any developer not running Vite natively on their local machine. SharedArrayBuffer (required for WASM threads) also requires COOP/COEP headers which are already set — but the server must be reachable first.

Fix: Revert to '0.0.0.0' or make it configurable via an env var (e.g., process.env.VITE_HOST ?? '0.0.0.0').

5. No guard against reload during active transcription (SUGGESTION — src/App.tsx:1088)

The if (appStore.modelState() === 'ready') return guard was intentionally removed to enable reload. However, there is no check for recordingState() === 'recording'. Reloading mid-transcription will tear down the worker and drop in-flight audio without warning.

Positive Observations
  • Sanitization layer in settingsStorage.ts is thorough: readModelBackend, readQuantization, and readIntegerInRange all validate and clamp untrusted localStorage values before use. The test coverage for out-of-range and invalid values is good.
  • _resolveBackend refactor cleanly separates user intent (ModelBackendMode) from runtime capability (BackendType), making the fallback logic explicit and testable.
  • normalizeCpuThreads duplication between TranscriptionWorkerClient and ModelManager is a minor redundancy but provides defense-in-depth.
  • frameStride ?? 1 default in transcription.worker.ts:229 is a safe fallback.
  • getMaxHardwareThreads is duplicated in App.tsx and SettingsPanel.tsx — consider extracting to a shared utility.
Verification Checklist
  • Run settingsStorage.test.ts — passes with new fields
  • Manual: set backend=wasm, encoderQuant=fp32, reload page → verify quant is not reset to int8
  • Manual: set backend=webgpu-hybrid on a machine without WebGPU → verify fallback message and that modelBackendMode reflects the actual backend
  • Manual: start recording, click Reload → verify behavior is safe
  • Docker/container: verify dev server is reachable after vite.config.js change
  • Verify getParakeetModel handles encoderQuant: 'int8' + WebGPU correctly (or confirm it applies the same fp32 override internally)
Files Reviewed (8 files)
  • src/App.tsx — 3 issues
  • src/components/SettingsPanel.tsx — no issues
  • src/lib/transcription/ModelManager.ts — 1 issue
  • src/lib/transcription/TranscriptionWorkerClient.ts — no issues
  • src/lib/transcription/transcription.worker.ts — no issues
  • src/lib/transcription/types.ts — no issues
  • src/stores/appStore.ts — no issues
  • src/utils/settingsStorage.ts — no issues
  • src/utils/settingsStorage.test.ts — no issues
  • vite.config.js — 1 issue

Fix these issues in Kilo Cloud

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
vite.config.js (1)

86-87: Dev server no longer accessible from LAN.

Switching host from 0.0.0.0 to localhost prevents access from other devices on the network (e.g., mobile testing with https certs). If that's intentional, looks fine.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vite.config.js` around lines 86 - 87, The dev server host was changed to
'localhost', which prevents LAN access; revert or make configurable by setting
the Vite dev server host back to '0.0.0.0' (or expose it via env) so other
devices can reach it; update the devServer config object where port: 5173 and
host: 'localhost' are defined (the host property in the Vite config) to use
'0.0.0.0' or an environment variable like process.env.DEV_HOST || '0.0.0.0'.
src/App.tsx (1)

227-236: getMaxHardwareThreads is duplicated in SettingsPanel.tsx (line 11).

Consider extracting this into a shared utility (e.g., src/utils/hardware.ts) to avoid the two copies drifting apart.

#!/bin/bash
# Verify the duplication
rg -n 'getMaxHardwareThreads' --type=ts --type=tsx -g '*.tsx' -g '*.ts'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/App.tsx` around lines 227 - 236, The getMaxHardwareThreads function (and
clampWasmThreadsForDevice) is duplicated; extract them into a single shared
utility module (e.g., create a new hardware utility that exports
getMaxHardwareThreads and clampWasmThreadsForDevice) and replace the local
copies in App.tsx and SettingsPanel.tsx with imports from that module; ensure
the exported functions keep the same signatures and return types so existing
callers (getMaxHardwareThreads, clampWasmThreadsForDevice) continue to work
without other changes.
src/stores/appStore.ts (1)

89-91: Default encoderQuant is 'int8' but the reactive effect in App.tsx will override it to 'fp32' for the default 'webgpu-hybrid' backend.

This is fine since model loading only happens after App mounts (where the effect runs), but worth noting that the store defaults alone don't reflect the aligned state — the alignment effect in App.tsx (lines 327–336) is required for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/stores/appStore.ts` around lines 89 - 91, Default encoderQuant ('int8')
conflicts with the alignment effect in App.tsx which forces 'fp32' when
modelBackendMode defaults to 'webgpu-hybrid'; change the store defaults to match
the post-effect state by initializing encoderQuant to 'fp32' (and optionally
decoderQuant if you want both consistent) in the createSignal call, leaving the
effect in App.tsx (which uses setEncoderQuant/setDecoderQuant and
modelBackendMode) unchanged so runtime behavior and store defaults align.
src/lib/transcription/ModelManager.ts (2)

422-427: Silent encoder quantization override on WebGPU — add a log warning.

Line 424 silently upgrades encoder from int8 to fp32 when the backend is WebGPU. This is likely correct (WebGPU may not support int8 encoder), but the caller receives no indication that their requested quantization was overridden. A console.warn would save debugging time when users wonder why the model size differs from expectations.

Proposed fix
 const resolvedEncoderQuant = backend.startsWith('webgpu') && encoderQuant === 'int8' ? 'fp32' : encoderQuant;
+if (resolvedEncoderQuant !== encoderQuant) {
+  console.warn(`[ModelManager] Encoder quantization overridden: ${encoderQuant} → ${resolvedEncoderQuant} (WebGPU does not support int8 encoder)`);
+}
 const encoderName = resolvedEncoderQuant === 'int8' ? 'encoder-model.int8.onnx' : 'encoder-model.onnx';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/transcription/ModelManager.ts` around lines 422 - 427, The code in
ModelManager.ts silently overrides encoderQuant from 'int8' to 'fp32' for WebGPU
backends (resolvedEncoderQuant computed from backend and encoderQuant) without
notifying callers; update the logic around resolvedEncoderQuant (and the
variables backend and encoderQuant) to log a warning when encoderQuant ===
'int8' and backend.startsWith('webgpu') so users are informed their requested
quantization was upgraded (use the existing logging facility if available or
console.warn) and include both the requested and effective quantization and the
modelId/repoId in the message for context.

357-360: Consider adding an upper-bound clamp to _normalizeCpuThreads.

The lower bound is enforced (Math.max(1, …)), but there's no upper-bound guard. A runaway value (e.g. 999) would be passed straight through to the ONNX runtime. Even if SettingsStorage clamps upstream, this helper is the last line of defense.

Proposed fix
 private _normalizeCpuThreads(value?: number): number | undefined {
   if (!Number.isFinite(value)) return undefined;
-  return Math.max(1, Math.floor(value as number));
+  const MAX_THREADS = navigator.hardwareConcurrency || 16;
+  return Math.min(MAX_THREADS, Math.max(1, Math.floor(value as number)));
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/transcription/ModelManager.ts` around lines 357 - 360, The
_normalizeCpuThreads helper currently only enforces a lower bound; update
_normalizeCpuThreads to also clamp values to a safe upper bound before returning
(e.g., min(floor(value), MAX_THREADS)), keeping the existing Number.isFinite
check and Math.floor + Math.max(1, …) behavior; choose a reasonable cap (for
example based on available CPUs via os.cpus().length or a defined constant like
MAX_CPU_THREADS) and reference that cap in the calculation so excessively large
inputs (e.g., 999) are constrained before being passed to the ONNX runtime.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/App.tsx`:
- Around line 326-336: The reactive createEffect in App.tsx currently forces
encoder/decoder quant values based only on appStore.modelBackendMode(), which
causes user selections (appStore.encoderQuant()/appStore.decoderQuant()) to be
overwritten; change the logic so the effect only enforces presets when a new
backend mode is selected AND the user isn’t actively controlling the quant
settings (e.g., add a flag in appStore like isQuantAutoControlled or check a new
store field quantControlMode === 'auto'|'manual'), and update the SettingsPanel
quant selects to disable/hide when the effect is controlling them; specifically
adjust the createEffect that calls appStore.setEncoderQuant and setDecoderQuant
to respect the new control flag and update SettingsPanel.tsx (the
encoder/decoder select controls) to reflect that flag so users aren’t silently
overridden.

In `@src/components/SettingsPanel.tsx`:
- Around line 103-155: The encoder/decoder selects can be changed by the user
but are later overridden by the reactive effect in App.tsx when certain backend
modes are selected; update the UI to reflect that by disabling the Encoder and
Decoder <select> controls (the ones bound to appStore.encoderQuant() and
appStore.decoderQuant()) whenever appStore.modelBackendMode() indicates the
backend auto-manages quantization, and add a short tooltip/title like "Managed
by backend selection" to those disabled controls; keep existing disabled
behavior tied to appStore.modelState() === 'loading' and ensure you don't remove
the onInput handlers (setEncoderQuant/setDecoderQuant) so they remain usable
when not auto-managed.

In `@src/lib/transcription/ModelManager.ts`:
- Around line 113-124: The runtime is passing a descriptive effectiveBackend
(which may be 'webgpu-hybrid') into parakeet.js calls causing runtime failures;
update the calls that construct/load Parakeet models to use runtimeBackend
instead of effectiveBackend—specifically inside createModelFromAssets (replace
backend: effectiveBackend with backend: runtimeBackend when calling
ParakeetModel.fromUrls) and any use in getParakeetModel that forwards a backend
to parakeet APIs; keep effectiveBackend for logging/UI only (e.g., the
console.log lines) so display text remains descriptive.

---

Duplicate comments:
In `@src/components/SettingsPanel.tsx`:
- Around line 11-16: The function getMaxHardwareThreads is duplicated in
SettingsPanel.tsx and App.tsx; extract it into a shared utility module (e.g.,
create a new util export getMaxHardwareThreads in a common utils file) and
import that single function into both SettingsPanel and App to remove
duplication; ensure the exported function signature and behavior (checks for
typeof navigator, Number.isFinite, default 4, and Math.max(1, Math.floor(...)))
remain unchanged and update both files to use the shared import.

---

Nitpick comments:
In `@src/App.tsx`:
- Around line 227-236: The getMaxHardwareThreads function (and
clampWasmThreadsForDevice) is duplicated; extract them into a single shared
utility module (e.g., create a new hardware utility that exports
getMaxHardwareThreads and clampWasmThreadsForDevice) and replace the local
copies in App.tsx and SettingsPanel.tsx with imports from that module; ensure
the exported functions keep the same signatures and return types so existing
callers (getMaxHardwareThreads, clampWasmThreadsForDevice) continue to work
without other changes.

In `@src/lib/transcription/ModelManager.ts`:
- Around line 422-427: The code in ModelManager.ts silently overrides
encoderQuant from 'int8' to 'fp32' for WebGPU backends (resolvedEncoderQuant
computed from backend and encoderQuant) without notifying callers; update the
logic around resolvedEncoderQuant (and the variables backend and encoderQuant)
to log a warning when encoderQuant === 'int8' and backend.startsWith('webgpu')
so users are informed their requested quantization was upgraded (use the
existing logging facility if available or console.warn) and include both the
requested and effective quantization and the modelId/repoId in the message for
context.
- Around line 357-360: The _normalizeCpuThreads helper currently only enforces a
lower bound; update _normalizeCpuThreads to also clamp values to a safe upper
bound before returning (e.g., min(floor(value), MAX_THREADS)), keeping the
existing Number.isFinite check and Math.floor + Math.max(1, …) behavior; choose
a reasonable cap (for example based on available CPUs via os.cpus().length or a
defined constant like MAX_CPU_THREADS) and reference that cap in the calculation
so excessively large inputs (e.g., 999) are constrained before being passed to
the ONNX runtime.

In `@src/stores/appStore.ts`:
- Around line 89-91: Default encoderQuant ('int8') conflicts with the alignment
effect in App.tsx which forces 'fp32' when modelBackendMode defaults to
'webgpu-hybrid'; change the store defaults to match the post-effect state by
initializing encoderQuant to 'fp32' (and optionally decoderQuant if you want
both consistent) in the createSignal call, leaving the effect in App.tsx (which
uses setEncoderQuant/setDecoderQuant and modelBackendMode) unchanged so runtime
behavior and store defaults align.

In `@vite.config.js`:
- Around line 86-87: The dev server host was changed to 'localhost', which
prevents LAN access; revert or make configurable by setting the Vite dev server
host back to '0.0.0.0' (or expose it via env) so other devices can reach it;
update the devServer config object where port: 5173 and host: 'localhost' are
defined (the host property in the Vite config) to use '0.0.0.0' or an
environment variable like process.env.DEV_HOST || '0.0.0.0'.

Comment on lines 326 to 336
// Keep quantization presets aligned with the backend mode (parakeet.js demo behavior).
createEffect(() => {
const backendMode = appStore.modelBackendMode();
if (backendMode.startsWith('webgpu')) {
if (appStore.encoderQuant() !== 'fp32') appStore.setEncoderQuant('fp32');
if (appStore.decoderQuant() !== 'int8') appStore.setDecoderQuant('int8');
} else {
if (appStore.encoderQuant() !== 'int8') appStore.setEncoderQuant('int8');
if (appStore.decoderQuant() !== 'int8') appStore.setDecoderQuant('int8');
}
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reactive effect overrides user-selected quantization values.

In SolidJS, this effect tracks modelBackendMode(), encoderQuant(), and decoderQuant() as dependencies. If a user manually changes encoder quantization via the UI (e.g., selects int8 while in webgpu-hybrid mode), the effect re-fires and immediately reverts it to fp32. The quantization selects in SettingsPanel.tsx (lines 131–154) are only disabled during loading — they should also be disabled (or hidden) when the effect is controlling them, to avoid confusing UX where user selections are silently overridden.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/App.tsx` around lines 326 - 336, The reactive createEffect in App.tsx
currently forces encoder/decoder quant values based only on
appStore.modelBackendMode(), which causes user selections
(appStore.encoderQuant()/appStore.decoderQuant()) to be overwritten; change the
logic so the effect only enforces presets when a new backend mode is selected
AND the user isn’t actively controlling the quant settings (e.g., add a flag in
appStore like isQuantAutoControlled or check a new store field quantControlMode
=== 'auto'|'manual'), and update the SettingsPanel quant selects to disable/hide
when the effect is controlling them; specifically adjust the createEffect that
calls appStore.setEncoderQuant and setDecoderQuant to respect the new control
flag and update SettingsPanel.tsx (the encoder/decoder select controls) to
reflect that flag so users aren’t silently overridden.

Comment on lines +103 to +155
<div class="grid grid-cols-2 gap-x-4 gap-y-3 pt-1">
<div class="space-y-1">
<span class="text-[10px] font-bold uppercase tracking-widest text-[var(--color-earthy-soft-brown)]">Backend</span>
<select
class="w-full text-sm bg-transparent border-b border-[var(--color-earthy-sage)]/40 px-0 py-1.5 text-[var(--color-earthy-dark-brown)] focus:outline-none focus:border-[var(--color-earthy-muted-green)]"
value={appStore.modelBackendMode()}
onInput={(e) => appStore.setModelBackendMode((e.target as HTMLSelectElement).value as 'webgpu-hybrid' | 'wasm')}
disabled={appStore.modelState() === 'loading'}
>
<option value="webgpu-hybrid">WebGPU</option>
<option value="wasm">WASM</option>
</select>
</div>
<div class="space-y-1">
<span class="text-[10px] font-bold uppercase tracking-widest text-[var(--color-earthy-soft-brown)]">Stride</span>
<input
type="number"
min="1"
max="4"
step="1"
value={appStore.frameStride()}
onInput={(e) => {
const next = Number((e.target as HTMLInputElement).value);
if (Number.isFinite(next)) appStore.setFrameStride(Math.max(1, Math.min(4, Math.round(next))));
}}
class="w-full text-sm bg-transparent border-b border-[var(--color-earthy-sage)]/40 px-0 py-1.5 text-[var(--color-earthy-dark-brown)] focus:outline-none focus:border-[var(--color-earthy-muted-green)]"
/>
</div>
<div class="space-y-1">
<span class="text-[10px] font-bold uppercase tracking-widest text-[var(--color-earthy-soft-brown)]">Encoder</span>
<select
class="w-full text-sm bg-transparent border-b border-[var(--color-earthy-sage)]/40 px-0 py-1.5 text-[var(--color-earthy-dark-brown)] focus:outline-none focus:border-[var(--color-earthy-muted-green)]"
value={appStore.encoderQuant()}
onInput={(e) => appStore.setEncoderQuant((e.target as HTMLSelectElement).value as 'int8' | 'fp32')}
disabled={appStore.modelState() === 'loading'}
>
<option value="fp32">fp32</option>
<option value="int8">int8</option>
</select>
</div>
<div class="space-y-1">
<span class="text-[10px] font-bold uppercase tracking-widest text-[var(--color-earthy-soft-brown)]">Decoder</span>
<select
class="w-full text-sm bg-transparent border-b border-[var(--color-earthy-sage)]/40 px-0 py-1.5 text-[var(--color-earthy-dark-brown)] focus:outline-none focus:border-[var(--color-earthy-muted-green)]"
value={appStore.decoderQuant()}
onInput={(e) => appStore.setDecoderQuant((e.target as HTMLSelectElement).value as 'int8' | 'fp32')}
disabled={appStore.modelState() === 'loading'}
>
<option value="int8">int8</option>
<option value="fp32">fp32</option>
</select>
</div>
</div>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

New configuration grid is well-structured.

Backend, Stride, Encoder, and Decoder controls are correctly wired to the app store. One note: since the reactive effect in App.tsx overrides encoder/decoder quant based on backend mode, consider adding visual indication (e.g., disabled or a tooltip) when quantization values are auto-managed by the backend selection — otherwise users may be confused when their selection reverts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/SettingsPanel.tsx` around lines 103 - 155, The encoder/decoder
selects can be changed by the user but are later overridden by the reactive
effect in App.tsx when certain backend modes are selected; update the UI to
reflect that by disabling the Encoder and Decoder <select> controls (the ones
bound to appStore.encoderQuant() and appStore.decoderQuant()) whenever
appStore.modelBackendMode() indicates the backend auto-manages quantization, and
add a short tooltip/title like "Managed by backend selection" to those disabled
controls; keep existing disabled behavior tied to appStore.modelState() ===
'loading' and ensure you don't remove the onInput handlers
(setEncoderQuant/setDecoderQuant) so they remain usable when not auto-managed.

@ysdede
Copy link
Owner Author

ysdede commented Feb 21, 2026

Addressed review feedback in d593d72:

  • Fixed backend plumbing bug in ModelManager: now passing runtime backend (webgpu/wasm) to parakeet.js APIs instead of descriptive webgpu-hybrid.
  • Applied encoder quant override consistently for WebGPU before both hub/direct asset paths, with explicit warning log.
  • Added upper-bound clamp in ModelManager CPU thread normalization.
  • Extracted shared hardware thread helpers to src/utils/hardwareThreads.ts and reused in App/SettingsPanel/store defaults.
  • Adjusted quant preset effect to run only when backend actually changes (prevents immediate overwrite of restored/manual values).
  • Made dev server host configurable (DEV_HOST) with 0.0.0.0 default to keep LAN/container access.
  • Blocked model reload while recording is active.

Not applied:

  • The comment claiming appStore.setBackend is missing is a false positive. backend/setBackend exist and are exported in the store.

@ysdede
Copy link
Owner Author

ysdede commented Feb 21, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Repository owner deleted a comment from ysdede Feb 21, 2026
@AAlp22 AAlp22 self-assigned this Feb 21, 2026
@AAlp22 AAlp22 merged commit 009d994 into master Feb 21, 2026
3 checks passed
github-actions bot pushed a commit that referenced this pull request Feb 21, 2026
feat: add wasm thread control and runtime backend/quantization/stride settings 009d994
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants