feat: add wasm thread control and runtime backend/quantization/stride settings#175
feat: add wasm thread control and runtime backend/quantization/stride settings#175
Conversation
Reviewer's GuideAdds configurable runtime controls for Parakeet’s backend, quantization, decoder stride, and WASM thread count, plumbs them from UI through app store, persistence, worker client, and ModelManager into the transcription worker, and refactors backend/asset resolution accordingly. Sequence diagram for model initialization with backend, quantization, and WASM threadssequenceDiagram
actor User
participant SettingsPanel
participant App
participant AppStore
participant TranscriptionWorkerClient as WorkerClient
participant WorkerThread as Worker
participant ModelManager
participant ParakeetModel
User->>SettingsPanel: Click Load / Reload
SettingsPanel->>App: onLoadModel()
App->>AppStore: read modelBackendMode, wasmThreads, encoderQuant, decoderQuant
App->>WorkerClient: initModel(options)
activate WorkerClient
WorkerClient->>WorkerClient: normalizeCpuThreads(cpuThreads)
WorkerClient->>Worker: postMessage INIT_MODEL(options)
deactivate WorkerClient
activate Worker
Worker->>ModelManager: loadModel(config)
activate ModelManager
ModelManager->>ModelManager: _normalizeCpuThreads(cpuThreads)
ModelManager->>ModelManager: _normalizeRequestedBackend(backend)
ModelManager->>ModelManager: _normalizeQuantization(encoderQuant)
ModelManager->>ModelManager: _normalizeQuantization(decoderQuant)
ModelManager->>ModelManager: _resolveBackend(requestedBackend)
ModelManager->>App: onProgress(backend, effectiveBackend)
ModelManager->>ParakeetModel: fromUrls({backend, cpuThreads, urls})
activate ParakeetModel
ParakeetModel-->>ModelManager: model instance
deactivate ParakeetModel
ModelManager-->>Worker: model ready
deactivate ModelManager
Worker-->>App: INIT_MODEL_DONE
deactivate Worker
App->>AppStore: setModelState(ready)
App->>AppStore: setBackend(runtimeBackend)
Updated class diagram for ModelManager, TranscriptionWorkerClient, and configuration typesclassDiagram
class ModelManager {
-backend : BackendType
+loadModel(config) void
+loadLocalModel(files, options) void
-_buildDirectModelAssets(modelId, backend, encoderQuant, decoderQuant, getModelConfig) ResolvedModelAssets
-_normalizeCpuThreads(value) number
-_normalizeRequestedBackend(value) ModelBackendMode
-_normalizeQuantization(value, fallback) QuantizationMode
-_resolveBackend(requestedBackend) Promise~ResolveBackendResult~
}
class ResolveBackendResult {
+effectiveBackend : ModelBackendMode
+runtimeBackend : BackendType
}
class TranscriptionWorkerClient {
-worker : Worker
+initModel(options) Promise~void~
+initLocalModel(files, options) Promise~void~
+initService(config) Promise~void~
+initV3Service(config) Promise~void~
+sendRequest(type, payload) Promise~any~
-normalizeCpuThreads(cpuThreads) number
}
class InitModelOptions {
+modelId : string
+cpuThreads : number
+backend : ModelBackendMode
+encoderQuant : QuantizationMode
+decoderQuant : QuantizationMode
}
class InitLocalModelOptions {
+cpuThreads : number
+backend : ModelBackendMode
}
class ModelConfig {
+modelId : string
+backend : ModelBackendMode
+cpuThreads : number
+encoderQuant : QuantizationMode
+decoderQuant : QuantizationMode
}
class ModelProgress {
+stage : string
+progress : number
+message : string
+file : string
+backend : BackendType
}
class PersistedSettings {
+general : PersistedGeneralSettings
+model : PersistedModelSettings
+audio : PersistedAudioSettings
+ui : PersistedUiSettings
}
class PersistedGeneralSettings {
+v4InferenceIntervalMs : number
+v4SilenceFlushSec : number
+streamingWindow : number
+frameStride : number
+wasmThreads : number
}
class PersistedModelSettings {
+selectedModelId : string
+backend : ModelBackendMode
+encoderQuant : QuantizationMode
+decoderQuant : QuantizationMode
}
class AppStore {
+modelBackendMode() ModelBackendMode
+encoderQuant() QuantizationMode
+decoderQuant() QuantizationMode
+frameStride() number
+wasmThreads() number
+setModelBackendMode(mode) void
+setEncoderQuant(mode) void
+setDecoderQuant(mode) void
+setFrameStride(value) void
+setWasmThreads(value) void
}
class Types {
<<enumeration>> BackendType
webgpu
wasm
}
class ModelBackendModeEnum {
<<enumeration>> ModelBackendMode
webgpu-hybrid
wasm
}
class QuantizationModeEnum {
<<enumeration>> QuantizationMode
int8
fp32
}
TranscriptionWorkerClient --> InitModelOptions : uses
TranscriptionWorkerClient --> InitLocalModelOptions : uses
TranscriptionWorkerClient --> ModelConfig : configures
ModelManager --> ModelConfig : uses
ModelManager --> ModelProgress : emits
PersistedSettings --> PersistedGeneralSettings : has
PersistedSettings --> PersistedModelSettings : has
AppStore --> PersistedModelSettings : persists
AppStore --> PersistedGeneralSettings : persists
ModelManager --> Types : runtimeBackend
ModelManager --> ModelBackendModeEnum : effectiveBackend
ModelManager --> QuantizationModeEnum : quantization
AppStore --> ModelBackendModeEnum : selected
AppStore --> QuantizationModeEnum : selected
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThis PR introduces a comprehensive configuration system for model backends (WebGPU vs WASM) and quantization modes (int8 vs fp32), extending the persistence/hydration layer, worker communication protocol, and model manager with backend resolution and thread normalization logic. UI controls are added to the settings panel for user configuration. Changes
Sequence DiagramsequenceDiagram
actor User
participant SettingsPanel
participant AppStore
participant AppComponent
participant TranscriptionWorkerClient as WorkerClient
participant ModelManager
participant ParakeetModel
User->>SettingsPanel: Select backend & quantization
SettingsPanel->>AppStore: setModelBackendMode(), setEncoderQuant(), setDecoderQuant()
AppStore->>AppComponent: Emit state changes
AppComponent->>AppComponent: Sync quantization with backend (reactive effect)
AppComponent->>AppComponent: Build model config
User->>SettingsPanel: Click Load/Reload
SettingsPanel->>AppComponent: Trigger loadSelectedModel()
AppComponent->>WorkerClient: initModel({ modelId, backend, encoderQuant, decoderQuant, cpuThreads })
WorkerClient->>ModelManager: loadModel(config)
ModelManager->>ModelManager: _resolveBackend(requestedBackend)
Note over ModelManager: Determine effectiveBackend & runtimeBackend<br/>(with WebGPU fallback)
ModelManager->>ModelManager: _normalizeQuantization(encoderQuant, decoderQuant)
ModelManager->>ModelManager: Build model assets with quantization-aware<br/>ONNX filenames
ModelManager->>ParakeetModel: Create with resolved backend & config
ParakeetModel-->>ModelManager: Ready
ModelManager-->>WorkerClient: onModelProgress({ backend, ... })
WorkerClient-->>AppComponent: Forward progress to app store
AppComponent->>AppStore: Persist settings including backend/quantization/threads
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The
getMaxHardwareThreads/ hardware thread detection and WASM thread clamping logic is now duplicated acrossSettingsPanel.tsx,App.tsx, andappStore.ts; consider extracting this into a shared utility to keep behavior consistent and easier to maintain. - There is now both
BackendType('webgpu' | 'wasm') andModelBackendMode('webgpu-hybrid' | 'wasm') plus_resolveBackendreturning botheffectiveBackendandruntimeBackend; adding a small helper or clearer naming to distinguish user-facing vs runtime backend values would reduce confusion and the risk of passing the wrong one into consumers likeParakeetModel.fromUrlsor_buildDirectModelAssets. - CPU thread normalization is implemented separately in
ModelManager._normalizeCpuThreadsandTranscriptionWorkerClient.normalizeCpuThreads; centralizing this logic would ensure identical clamping behavior between main thread and worker calls.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `getMaxHardwareThreads` / hardware thread detection and WASM thread clamping logic is now duplicated across `SettingsPanel.tsx`, `App.tsx`, and `appStore.ts`; consider extracting this into a shared utility to keep behavior consistent and easier to maintain.
- There is now both `BackendType` (`'webgpu' | 'wasm'`) and `ModelBackendMode` (`'webgpu-hybrid' | 'wasm'`) plus `_resolveBackend` returning both `effectiveBackend` and `runtimeBackend`; adding a small helper or clearer naming to distinguish user-facing vs runtime backend values would reduce confusion and the risk of passing the wrong one into consumers like `ParakeetModel.fromUrls` or `_buildDirectModelAssets`.
- CPU thread normalization is implemented separately in `ModelManager._normalizeCpuThreads` and `TranscriptionWorkerClient.normalizeCpuThreads`; centralizing this logic would ensure identical clamping behavior between main thread and worker calls.
## Individual Comments
### Comment 1
<location> `src/components/SettingsPanel.tsx:11-15` </location>
<code_context>
return `${ms}ms`;
};
+const getMaxHardwareThreads = () => {
+ if (typeof navigator === 'undefined' || !Number.isFinite(navigator.hardwareConcurrency)) {
+ return 4;
+ }
+ return Math.max(1, Math.floor(navigator.hardwareConcurrency));
+};
+
</code_context>
<issue_to_address>
**suggestion:** Consider centralizing the hardware thread detection logic instead of duplicating it in multiple places.
This `getMaxHardwareThreads` logic now appears here, in `App.tsx`, and `appStore` has similar `hardwareThreads`/`defaultWasmThreads` logic. Extracting a shared utility (e.g. `utils/hardware.ts`) would prevent divergence if the heuristics change and keep related tests aligned.
Suggested implementation:
```typescript
return `${ms}ms`;
};
/** Visible section preset for the embeddable settings content. */
export type SettingsPanelSection = 'full' | 'audio' | 'model';
```
To fully implement the centralization you suggested, you’ll also want to:
1. Create a shared utility, e.g. `src/utils/hardware.ts`:
- Export a `getMaxHardwareThreads` function containing the logic that was just removed:
```ts
export const getMaxHardwareThreads = (): number => {
if (typeof navigator === 'undefined' || !Number.isFinite(navigator.hardwareConcurrency)) {
return 4;
}
return Math.max(1, Math.floor(navigator.hardwareConcurrency));
};
```
2. In `src/components/SettingsPanel.tsx`:
- Add an import near the top of the file:
```ts
import { getMaxHardwareThreads } from '../utils/hardware';
```
- Update any usage of `getMaxHardwareThreads` in this file (if present below the snippet) to use the imported function (no callsite changes should be needed beyond the import).
3. In `App.tsx`:
- Remove any inline/duplicated `getMaxHardwareThreads` logic.
- Import and use `getMaxHardwareThreads` from `../utils/hardware` (adjust the relative path to match your project structure).
4. In `appStore` (where `hardwareThreads` / `defaultWasmThreads` logic lives):
- Replace any custom hardware-thread detection with `getMaxHardwareThreads` from `../utils/hardware` (again, adjusting the path).
5. Add or update tests:
- Create tests for `getMaxHardwareThreads` in a shared test file (e.g. `src/utils/__tests__/hardware.test.ts`), covering:
- `navigator` undefined (SSR) case.
- Non-finite `hardwareConcurrency`.
- Typical values (e.g. 2, 4, 8).
- Remove or update any previous tests that duplicated this logic in `App.tsx` or store tests.
These steps will ensure all parts of the app share a single, consistent heuristic for hardware thread detection.
</issue_to_address>
### Comment 2
<location> `src/lib/transcription/TranscriptionWorkerClient.ts:178-180` </location>
<code_context>
return transferOwnership === false ? [] : [buffer as ArrayBuffer];
}
+ private normalizeCpuThreads(cpuThreads?: number): number | undefined {
+ if (!Number.isFinite(cpuThreads)) return undefined;
+ return Math.max(1, Math.floor(cpuThreads as number));
+ }
+
</code_context>
<issue_to_address>
**suggestion:** The CPU thread normalization logic is duplicated with `ModelManager._normalizeCpuThreads`; consider sharing a single implementation.
`TranscriptionWorkerClient.normalizeCpuThreads` and `ModelManager._normalizeCpuThreads` use the same `Number.isFinite` + `Math.max(1, floor)` logic. If only one is updated in the future (e.g., adding an upper bound or device-specific cap), their behavior will diverge. Please extract this into a shared helper or common utility to keep the logic in sync.
Suggested implementation:
```typescript
return transferOwnership === false ? [] : [buffer as ArrayBuffer];
}
+import { normalizeCpuThreads } from './cpuThreads';
+import type {
+ ModelState,
+ ModelProgress,
+ TranscriptionResult,
+ TranscriptionServiceConfig,
+ ModelBackendMode,
+ QuantizationMode,
+} from './types';
+import { TokenStreamResult, TokenStreamConfig } from './TokenStreamTranscriber';
```
To fully implement the deduplication and keep behavior in sync:
1. Create a shared helper module, e.g. `src/lib/transcription/cpuThreads.ts`:
- Export a function:
```ts
export function normalizeCpuThreads(cpuThreads?: number): number | undefined {
if (!Number.isFinite(cpuThreads)) return undefined;
return Math.max(1, Math.floor(cpuThreads as number));
}
```
2. Update `TranscriptionWorkerClient` (in this file) to:
- Remove all internal calls to `this.normalizeCpuThreads(...)` (if any) and replace them with `normalizeCpuThreads(...)` from the new helper.
3. Update `ModelManager` (likely `src/lib/transcription/ModelManager.ts`):
- Replace the `_normalizeCpuThreads` implementation with calls to the shared `normalizeCpuThreads` helper.
- Remove the now-redundant `_normalizeCpuThreads` method from `ModelManager`.
These changes ensure that any future changes to CPU thread normalization (e.g. adding an upper bound or device-specific cap) are made in one place only.
</issue_to_address>
### Comment 3
<location> `src/App.tsx:471` </location>
<code_context>
appStore.setModelProgress(p.progress);
appStore.setModelMessage(p.message || '');
if (p.file) appStore.setModelFile(p.file);
+ if (p.backend) appStore.setBackend(p.backend);
};
</code_context>
<issue_to_address>
**issue (bug_risk):** `appStore.setBackend` is called but there is no corresponding signal/setter in the store API.
`createAppStore` defines `modelBackendMode`, `encoderQuant`, `decoderQuant`, and `wasmThreads`, but no `backend`/`setBackend` pair is exported. This call will not type‑check (or will hit an undefined property at runtime). If you need to track the resolved runtime backend separately from `modelBackendMode`, add a dedicated signal (e.g. `[runtimeBackend, setRuntimeBackend]`) to `createAppStore` and export the setter.
</issue_to_address>
### Comment 4
<location> `src/App.tsx:227` </location>
<code_context>
};
};
+const getMaxHardwareThreads = (): number => {
+ if (typeof navigator === 'undefined' || !Number.isFinite(navigator.hardwareConcurrency)) {
+ return 4;
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting shared thread helpers, moving backend/quantization policy into the store, and centralizing settings hydration to keep App.tsx slimmer and less coupled to configuration details.
The new functionality looks solid, but there are a couple of places where you can reduce complexity and duplication without changing behavior.
### 1. Extract thread helpers to a shared utility
`getMaxHardwareThreads` / `clampWasmThreadsForDevice` in `App.tsx` duplicate logic that already exists in `SettingsPanel.tsx`. Moving them into a shared module removes duplication and shrinks `App.tsx`.
**Example:**
```ts
// src/utils/hardwareThreads.ts
export const getMaxHardwareThreads = (): number => {
if (typeof navigator === 'undefined' || !Number.isFinite(navigator.hardwareConcurrency)) {
return 4;
}
return Math.max(1, Math.floor(navigator.hardwareConcurrency));
};
export const clampWasmThreadsForDevice = (value: number): number =>
Math.max(1, Math.min(getMaxHardwareThreads(), Math.floor(value)));
```
Then in `App.tsx` and `SettingsPanel.tsx`:
```ts
import { clampWasmThreadsForDevice } from '@/utils/hardwareThreads';
// ...
if (persistedGeneral?.wasmThreads !== undefined) {
appStore.setWasmThreads(clampWasmThreadsForDevice(persistedGeneral.wasmThreads));
}
```
This keeps the behavior identical, but centralizes the environment/capability logic.
### 2. Move backend/quantization policy out of `App`
The `createEffect` that enforces `encoderQuant`/`decoderQuant` presets based on `modelBackendMode` is policy logic that fits better in the store (or a configuration module) than in `App.tsx`. If you encapsulate it, `App` doesn’t need to know about the quantization rules.
**Example store API:**
```ts
// in appStore.ts (or wherever the store lives)
const applyBackendPresets = (backendMode: BackendMode) => {
if (backendMode.startsWith('webgpu')) {
setEncoderQuant('fp32');
setDecoderQuant('int8');
} else {
setEncoderQuant('int8');
setDecoderQuant('int8');
}
};
export const setModelBackendMode = (backendMode: BackendMode) => {
state.modelBackendMode = backendMode;
applyBackendPresets(backendMode);
};
```
Then in `App.tsx` you can drop the `createEffect` and just call:
```ts
if (persistedModel?.backend !== undefined) {
appStore.setModelBackendMode(persistedModel.backend);
}
```
Any other caller that changes the backend automatically gets the same quantization policy, without extra effects in `App`.
### 3. Optional: consolidate settings hydration
You’re now restoring a growing list of fields in `App.tsx` (`energyThreshold`, `sileroThreshold`, `frameStride`, `wasmThreads`, backend/quant, etc.). As this grows, a dedicated hydration function in the store can keep `App.tsx` slimmer.
**Example:**
```ts
// in appStore.ts
export const hydrateFromPersistedSettings = (settings: PersistedSettings) => {
const { general, audio, model, ui } = settings;
if (model?.selectedModelId && MODELS.some(m => m.id === model.selectedModelId)) {
setSelectedModelId(model.selectedModelId);
}
if (model?.backend !== undefined) setModelBackendMode(model.backend);
if (model?.encoderQuant !== undefined) setEncoderQuant(model.encoderQuant);
if (model?.decoderQuant !== undefined) setDecoderQuant(model.decoderQuant);
if (general?.energyThreshold !== undefined) setEnergyThreshold(general.energyThreshold);
if (general?.sileroThreshold !== undefined) setSileroThreshold(general.sileroThreshold);
if (general?.v4InferenceIntervalMs !== undefined) setV4InferenceIntervalMs(general.v4InferenceIntervalMs);
if (general?.v4SilenceFlushSec !== undefined) setV4SilenceFlushSec(general.v4SilenceFlushSec);
if (general?.streamingWindow !== undefined) setStreamingWindow(general.streamingWindow);
if (general?.frameStride !== undefined) setFrameStride(general.frameStride);
if (general?.wasmThreads !== undefined) setWasmThreads(clampWasmThreadsForDevice(general.wasmThreads));
if (ui?.debugPanel?.visible !== undefined) setShowDebugPanel(ui.debugPanel.visible);
};
```
Then in `App.tsx`:
```ts
const persistedSettings = loadSettingsFromStorage();
appStore.hydrateFromPersistedSettings(persistedSettings);
```
This keeps all current behavior but reduces the amount of cross-cutting configuration code in `App.tsx`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| private normalizeCpuThreads(cpuThreads?: number): number | undefined { | ||
| if (!Number.isFinite(cpuThreads)) return undefined; | ||
| return Math.max(1, Math.floor(cpuThreads as number)); |
There was a problem hiding this comment.
suggestion: The CPU thread normalization logic is duplicated with ModelManager._normalizeCpuThreads; consider sharing a single implementation.
TranscriptionWorkerClient.normalizeCpuThreads and ModelManager._normalizeCpuThreads use the same Number.isFinite + Math.max(1, floor) logic. If only one is updated in the future (e.g., adding an upper bound or device-specific cap), their behavior will diverge. Please extract this into a shared helper or common utility to keep the logic in sync.
Suggested implementation:
return transferOwnership === false ? [] : [buffer as ArrayBuffer];
}
+import { normalizeCpuThreads } from './cpuThreads';
+import type {
+ ModelState,
+ ModelProgress,
+ TranscriptionResult,
+ TranscriptionServiceConfig,
+ ModelBackendMode,
+ QuantizationMode,
+} from './types';
+import { TokenStreamResult, TokenStreamConfig } from './TokenStreamTranscriber';To fully implement the deduplication and keep behavior in sync:
-
Create a shared helper module, e.g.
src/lib/transcription/cpuThreads.ts:- Export a function:
export function normalizeCpuThreads(cpuThreads?: number): number | undefined { if (!Number.isFinite(cpuThreads)) return undefined; return Math.max(1, Math.floor(cpuThreads as number)); }
- Export a function:
-
Update
TranscriptionWorkerClient(in this file) to:- Remove all internal calls to
this.normalizeCpuThreads(...)(if any) and replace them withnormalizeCpuThreads(...)from the new helper.
- Remove all internal calls to
-
Update
ModelManager(likelysrc/lib/transcription/ModelManager.ts):- Replace the
_normalizeCpuThreadsimplementation with calls to the sharednormalizeCpuThreadshelper. - Remove the now-redundant
_normalizeCpuThreadsmethod fromModelManager.
- Replace the
These changes ensure that any future changes to CPU thread normalization (e.g. adding an upper bound or device-specific cap) are made in one place only.
| appStore.setModelProgress(p.progress); | ||
| appStore.setModelMessage(p.message || ''); | ||
| if (p.file) appStore.setModelFile(p.file); | ||
| if (p.backend) appStore.setBackend(p.backend); |
There was a problem hiding this comment.
issue (bug_risk): appStore.setBackend is called but there is no corresponding signal/setter in the store API.
createAppStore defines modelBackendMode, encoderQuant, decoderQuant, and wasmThreads, but no backend/setBackend pair is exported. This call will not type‑check (or will hit an undefined property at runtime). If you need to track the resolved runtime backend separately from modelBackendMode, add a dedicated signal (e.g. [runtimeBackend, setRuntimeBackend]) to createAppStore and export the setter.
src/App.tsx
Outdated
| }; | ||
| }; | ||
|
|
||
| const getMaxHardwareThreads = (): number => { |
There was a problem hiding this comment.
issue (complexity): Consider extracting shared thread helpers, moving backend/quantization policy into the store, and centralizing settings hydration to keep App.tsx slimmer and less coupled to configuration details.
The new functionality looks solid, but there are a couple of places where you can reduce complexity and duplication without changing behavior.
1. Extract thread helpers to a shared utility
getMaxHardwareThreads / clampWasmThreadsForDevice in App.tsx duplicate logic that already exists in SettingsPanel.tsx. Moving them into a shared module removes duplication and shrinks App.tsx.
Example:
// src/utils/hardwareThreads.ts
export const getMaxHardwareThreads = (): number => {
if (typeof navigator === 'undefined' || !Number.isFinite(navigator.hardwareConcurrency)) {
return 4;
}
return Math.max(1, Math.floor(navigator.hardwareConcurrency));
};
export const clampWasmThreadsForDevice = (value: number): number =>
Math.max(1, Math.min(getMaxHardwareThreads(), Math.floor(value)));Then in App.tsx and SettingsPanel.tsx:
import { clampWasmThreadsForDevice } from '@/utils/hardwareThreads';
// ...
if (persistedGeneral?.wasmThreads !== undefined) {
appStore.setWasmThreads(clampWasmThreadsForDevice(persistedGeneral.wasmThreads));
}This keeps the behavior identical, but centralizes the environment/capability logic.
2. Move backend/quantization policy out of App
The createEffect that enforces encoderQuant/decoderQuant presets based on modelBackendMode is policy logic that fits better in the store (or a configuration module) than in App.tsx. If you encapsulate it, App doesn’t need to know about the quantization rules.
Example store API:
// in appStore.ts (or wherever the store lives)
const applyBackendPresets = (backendMode: BackendMode) => {
if (backendMode.startsWith('webgpu')) {
setEncoderQuant('fp32');
setDecoderQuant('int8');
} else {
setEncoderQuant('int8');
setDecoderQuant('int8');
}
};
export const setModelBackendMode = (backendMode: BackendMode) => {
state.modelBackendMode = backendMode;
applyBackendPresets(backendMode);
};Then in App.tsx you can drop the createEffect and just call:
if (persistedModel?.backend !== undefined) {
appStore.setModelBackendMode(persistedModel.backend);
}Any other caller that changes the backend automatically gets the same quantization policy, without extra effects in App.
3. Optional: consolidate settings hydration
You’re now restoring a growing list of fields in App.tsx (energyThreshold, sileroThreshold, frameStride, wasmThreads, backend/quant, etc.). As this grows, a dedicated hydration function in the store can keep App.tsx slimmer.
Example:
// in appStore.ts
export const hydrateFromPersistedSettings = (settings: PersistedSettings) => {
const { general, audio, model, ui } = settings;
if (model?.selectedModelId && MODELS.some(m => m.id === model.selectedModelId)) {
setSelectedModelId(model.selectedModelId);
}
if (model?.backend !== undefined) setModelBackendMode(model.backend);
if (model?.encoderQuant !== undefined) setEncoderQuant(model.encoderQuant);
if (model?.decoderQuant !== undefined) setDecoderQuant(model.decoderQuant);
if (general?.energyThreshold !== undefined) setEnergyThreshold(general.energyThreshold);
if (general?.sileroThreshold !== undefined) setSileroThreshold(general.sileroThreshold);
if (general?.v4InferenceIntervalMs !== undefined) setV4InferenceIntervalMs(general.v4InferenceIntervalMs);
if (general?.v4SilenceFlushSec !== undefined) setV4SilenceFlushSec(general.v4SilenceFlushSec);
if (general?.streamingWindow !== undefined) setStreamingWindow(general.streamingWindow);
if (general?.frameStride !== undefined) setFrameStride(general.frameStride);
if (general?.wasmThreads !== undefined) setWasmThreads(clampWasmThreadsForDevice(general.wasmThreads));
if (ui?.debugPanel?.visible !== undefined) setShowDebugPanel(ui.debugPanel.visible);
};Then in App.tsx:
const persistedSettings = loadSettingsFromStorage();
appStore.hydrateFromPersistedSettings(persistedSettings);This keeps all current behavior but reduces the amount of cross-cutting configuration code in App.tsx.
| }); | ||
|
|
||
| // Keep quantization presets aligned with the backend mode (parakeet.js demo behavior). | ||
| createEffect(() => { |
There was a problem hiding this comment.
WARNING: createEffect silently discards user-persisted quantization choices
This effect fires immediately on mount (and on every modelBackendMode change), which means any encoderQuant/decoderQuant values restored from localStorage at lines 251–255 are immediately overwritten. For example, a user who saved backend: 'wasm' + encoderQuant: 'fp32' will always have encoderQuant reset to 'int8' on startup.
The effect also prevents the user from independently choosing fp32 encoder on WASM — the UI dropdowns appear editable but the effect immediately reverts them.
Safest fix: Remove this effect and instead apply the preset only when the backend mode changes (not on initial mount), or only apply defaults when the user has not explicitly set a quant value. Alternatively, document that quant is always derived from backend and make the dropdowns read-only / hidden when the preset is active.
| appStore.setModelProgress(p.progress); | ||
| appStore.setModelMessage(p.message || ''); | ||
| if (p.file) appStore.setModelFile(p.file); | ||
| if (p.backend) appStore.setBackend(p.backend); |
There was a problem hiding this comment.
WARNING: appStore.setBackend updates the runtime backend signal (the actual backend used), but appStore.modelBackendMode (the requested backend) is never updated when a WebGPU fallback to WASM occurs. After a fallback, modelBackendMode() still returns 'webgpu-hybrid' while backend() returns 'wasm'. On the next save-to-storage cycle (line 402–415), backend: 'webgpu-hybrid' is persisted, so the user's setting is not corrected. Consider also updating modelBackendMode when a fallback is detected, or at minimum surfacing the mismatch in the UI.
| const revision = 'main'; | ||
| const encoderName = backend === 'webgpu' ? 'encoder-model.onnx' : 'encoder-model.int8.onnx'; | ||
| const decoderName = 'decoder_joint-model.int8.onnx'; | ||
| const resolvedEncoderQuant = backend.startsWith('webgpu') && encoderQuant === 'int8' ? 'fp32' : encoderQuant; |
There was a problem hiding this comment.
WARNING: Behavioral inconsistency between primary and fallback asset paths for WebGPU + int8 encoder.
The fallback path (_buildDirectModelAssets) silently overrides encoderQuant from int8 → fp32 when backend.startsWith('webgpu') (line 424). However, the primary path via getParakeetModel at line 133–148 passes encoderQuant directly without this override. If getParakeetModel accepts int8 for WebGPU, the two paths produce different model files. If it does not, the primary path will fail or silently fall back inside parakeet.js.
The override logic should be applied consistently before both calls, or the comment should explain why the primary path is exempt.
vite.config.js
Outdated
| port: 3100, | ||
| host: '0.0.0.0', | ||
| port: 5173, | ||
| host: 'localhost', |
There was a problem hiding this comment.
WARNING: Changing host from '0.0.0.0' to 'localhost' breaks Docker/container and remote dev environments where the dev server must bind to all interfaces to be reachable from the host machine or other containers. This is a deployment regression for any team member not running Vite natively on their local machine.
If the intent is to restrict local-only access for security, this should be documented and opt-in (e.g., via an env var), not a hard default change.
| const loadSelectedModel = async () => { | ||
| if (!workerClient) return; | ||
| if (appStore.modelState() === 'ready') return; | ||
| if (appStore.modelState() === 'loading') return; |
There was a problem hiding this comment.
SUGGESTION: The guard if (appStore.modelState() === 'ready') return; was removed to allow model reload. However, there is no guard against reloading while transcription is actively running (recordingState() === 'recording'). Reloading the model mid-transcription will tear down the worker and drop in-flight audio. Consider adding:
| if (appStore.modelState() === 'loading') return; | |
| if (appStore.modelState() === 'loading') return; |
and separately checking appStore.recordingState() !== 'idle' before allowing reload, or at minimum showing a warning.
Code Review SummaryStatus: 4 Issues Found | Recommendation: Address before merge OverviewThis PR adds configurable runtime controls for backend mode (WebGPU/WASM), encoder/decoder quantization, decoder frame stride, and WASM thread count. The plumbing from UI → store → persistence → worker → ModelManager is well-structured and the sanitization/validation layer in Risk: Medium — The quantization
Issue Details (click to expand)WARNING
SUGGESTION
Detailed Notes1.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
vite.config.js (1)
86-87: Dev server no longer accessible from LAN.Switching
hostfrom0.0.0.0tolocalhostprevents access from other devices on the network (e.g., mobile testing withhttpscerts). If that's intentional, looks fine.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vite.config.js` around lines 86 - 87, The dev server host was changed to 'localhost', which prevents LAN access; revert or make configurable by setting the Vite dev server host back to '0.0.0.0' (or expose it via env) so other devices can reach it; update the devServer config object where port: 5173 and host: 'localhost' are defined (the host property in the Vite config) to use '0.0.0.0' or an environment variable like process.env.DEV_HOST || '0.0.0.0'.src/App.tsx (1)
227-236:getMaxHardwareThreadsis duplicated inSettingsPanel.tsx(line 11).Consider extracting this into a shared utility (e.g.,
src/utils/hardware.ts) to avoid the two copies drifting apart.#!/bin/bash # Verify the duplication rg -n 'getMaxHardwareThreads' --type=ts --type=tsx -g '*.tsx' -g '*.ts'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/App.tsx` around lines 227 - 236, The getMaxHardwareThreads function (and clampWasmThreadsForDevice) is duplicated; extract them into a single shared utility module (e.g., create a new hardware utility that exports getMaxHardwareThreads and clampWasmThreadsForDevice) and replace the local copies in App.tsx and SettingsPanel.tsx with imports from that module; ensure the exported functions keep the same signatures and return types so existing callers (getMaxHardwareThreads, clampWasmThreadsForDevice) continue to work without other changes.src/stores/appStore.ts (1)
89-91: DefaultencoderQuantis'int8'but the reactive effect in App.tsx will override it to'fp32'for the default'webgpu-hybrid'backend.This is fine since model loading only happens after App mounts (where the effect runs), but worth noting that the store defaults alone don't reflect the aligned state — the alignment effect in
App.tsx(lines 327–336) is required for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/stores/appStore.ts` around lines 89 - 91, Default encoderQuant ('int8') conflicts with the alignment effect in App.tsx which forces 'fp32' when modelBackendMode defaults to 'webgpu-hybrid'; change the store defaults to match the post-effect state by initializing encoderQuant to 'fp32' (and optionally decoderQuant if you want both consistent) in the createSignal call, leaving the effect in App.tsx (which uses setEncoderQuant/setDecoderQuant and modelBackendMode) unchanged so runtime behavior and store defaults align.src/lib/transcription/ModelManager.ts (2)
422-427: Silent encoder quantization override on WebGPU — add a log warning.Line 424 silently upgrades encoder from
int8tofp32when the backend is WebGPU. This is likely correct (WebGPU may not support int8 encoder), but the caller receives no indication that their requested quantization was overridden. Aconsole.warnwould save debugging time when users wonder why the model size differs from expectations.Proposed fix
const resolvedEncoderQuant = backend.startsWith('webgpu') && encoderQuant === 'int8' ? 'fp32' : encoderQuant; +if (resolvedEncoderQuant !== encoderQuant) { + console.warn(`[ModelManager] Encoder quantization overridden: ${encoderQuant} → ${resolvedEncoderQuant} (WebGPU does not support int8 encoder)`); +} const encoderName = resolvedEncoderQuant === 'int8' ? 'encoder-model.int8.onnx' : 'encoder-model.onnx';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/transcription/ModelManager.ts` around lines 422 - 427, The code in ModelManager.ts silently overrides encoderQuant from 'int8' to 'fp32' for WebGPU backends (resolvedEncoderQuant computed from backend and encoderQuant) without notifying callers; update the logic around resolvedEncoderQuant (and the variables backend and encoderQuant) to log a warning when encoderQuant === 'int8' and backend.startsWith('webgpu') so users are informed their requested quantization was upgraded (use the existing logging facility if available or console.warn) and include both the requested and effective quantization and the modelId/repoId in the message for context.
357-360: Consider adding an upper-bound clamp to_normalizeCpuThreads.The lower bound is enforced (
Math.max(1, …)), but there's no upper-bound guard. A runaway value (e.g.999) would be passed straight through to the ONNX runtime. Even ifSettingsStorageclamps upstream, this helper is the last line of defense.Proposed fix
private _normalizeCpuThreads(value?: number): number | undefined { if (!Number.isFinite(value)) return undefined; - return Math.max(1, Math.floor(value as number)); + const MAX_THREADS = navigator.hardwareConcurrency || 16; + return Math.min(MAX_THREADS, Math.max(1, Math.floor(value as number))); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/transcription/ModelManager.ts` around lines 357 - 360, The _normalizeCpuThreads helper currently only enforces a lower bound; update _normalizeCpuThreads to also clamp values to a safe upper bound before returning (e.g., min(floor(value), MAX_THREADS)), keeping the existing Number.isFinite check and Math.floor + Math.max(1, …) behavior; choose a reasonable cap (for example based on available CPUs via os.cpus().length or a defined constant like MAX_CPU_THREADS) and reference that cap in the calculation so excessively large inputs (e.g., 999) are constrained before being passed to the ONNX runtime.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/App.tsx`:
- Around line 326-336: The reactive createEffect in App.tsx currently forces
encoder/decoder quant values based only on appStore.modelBackendMode(), which
causes user selections (appStore.encoderQuant()/appStore.decoderQuant()) to be
overwritten; change the logic so the effect only enforces presets when a new
backend mode is selected AND the user isn’t actively controlling the quant
settings (e.g., add a flag in appStore like isQuantAutoControlled or check a new
store field quantControlMode === 'auto'|'manual'), and update the SettingsPanel
quant selects to disable/hide when the effect is controlling them; specifically
adjust the createEffect that calls appStore.setEncoderQuant and setDecoderQuant
to respect the new control flag and update SettingsPanel.tsx (the
encoder/decoder select controls) to reflect that flag so users aren’t silently
overridden.
In `@src/components/SettingsPanel.tsx`:
- Around line 103-155: The encoder/decoder selects can be changed by the user
but are later overridden by the reactive effect in App.tsx when certain backend
modes are selected; update the UI to reflect that by disabling the Encoder and
Decoder <select> controls (the ones bound to appStore.encoderQuant() and
appStore.decoderQuant()) whenever appStore.modelBackendMode() indicates the
backend auto-manages quantization, and add a short tooltip/title like "Managed
by backend selection" to those disabled controls; keep existing disabled
behavior tied to appStore.modelState() === 'loading' and ensure you don't remove
the onInput handlers (setEncoderQuant/setDecoderQuant) so they remain usable
when not auto-managed.
In `@src/lib/transcription/ModelManager.ts`:
- Around line 113-124: The runtime is passing a descriptive effectiveBackend
(which may be 'webgpu-hybrid') into parakeet.js calls causing runtime failures;
update the calls that construct/load Parakeet models to use runtimeBackend
instead of effectiveBackend—specifically inside createModelFromAssets (replace
backend: effectiveBackend with backend: runtimeBackend when calling
ParakeetModel.fromUrls) and any use in getParakeetModel that forwards a backend
to parakeet APIs; keep effectiveBackend for logging/UI only (e.g., the
console.log lines) so display text remains descriptive.
---
Duplicate comments:
In `@src/components/SettingsPanel.tsx`:
- Around line 11-16: The function getMaxHardwareThreads is duplicated in
SettingsPanel.tsx and App.tsx; extract it into a shared utility module (e.g.,
create a new util export getMaxHardwareThreads in a common utils file) and
import that single function into both SettingsPanel and App to remove
duplication; ensure the exported function signature and behavior (checks for
typeof navigator, Number.isFinite, default 4, and Math.max(1, Math.floor(...)))
remain unchanged and update both files to use the shared import.
---
Nitpick comments:
In `@src/App.tsx`:
- Around line 227-236: The getMaxHardwareThreads function (and
clampWasmThreadsForDevice) is duplicated; extract them into a single shared
utility module (e.g., create a new hardware utility that exports
getMaxHardwareThreads and clampWasmThreadsForDevice) and replace the local
copies in App.tsx and SettingsPanel.tsx with imports from that module; ensure
the exported functions keep the same signatures and return types so existing
callers (getMaxHardwareThreads, clampWasmThreadsForDevice) continue to work
without other changes.
In `@src/lib/transcription/ModelManager.ts`:
- Around line 422-427: The code in ModelManager.ts silently overrides
encoderQuant from 'int8' to 'fp32' for WebGPU backends (resolvedEncoderQuant
computed from backend and encoderQuant) without notifying callers; update the
logic around resolvedEncoderQuant (and the variables backend and encoderQuant)
to log a warning when encoderQuant === 'int8' and backend.startsWith('webgpu')
so users are informed their requested quantization was upgraded (use the
existing logging facility if available or console.warn) and include both the
requested and effective quantization and the modelId/repoId in the message for
context.
- Around line 357-360: The _normalizeCpuThreads helper currently only enforces a
lower bound; update _normalizeCpuThreads to also clamp values to a safe upper
bound before returning (e.g., min(floor(value), MAX_THREADS)), keeping the
existing Number.isFinite check and Math.floor + Math.max(1, …) behavior; choose
a reasonable cap (for example based on available CPUs via os.cpus().length or a
defined constant like MAX_CPU_THREADS) and reference that cap in the calculation
so excessively large inputs (e.g., 999) are constrained before being passed to
the ONNX runtime.
In `@src/stores/appStore.ts`:
- Around line 89-91: Default encoderQuant ('int8') conflicts with the alignment
effect in App.tsx which forces 'fp32' when modelBackendMode defaults to
'webgpu-hybrid'; change the store defaults to match the post-effect state by
initializing encoderQuant to 'fp32' (and optionally decoderQuant if you want
both consistent) in the createSignal call, leaving the effect in App.tsx (which
uses setEncoderQuant/setDecoderQuant and modelBackendMode) unchanged so runtime
behavior and store defaults align.
In `@vite.config.js`:
- Around line 86-87: The dev server host was changed to 'localhost', which
prevents LAN access; revert or make configurable by setting the Vite dev server
host back to '0.0.0.0' (or expose it via env) so other devices can reach it;
update the devServer config object where port: 5173 and host: 'localhost' are
defined (the host property in the Vite config) to use '0.0.0.0' or an
environment variable like process.env.DEV_HOST || '0.0.0.0'.
| // Keep quantization presets aligned with the backend mode (parakeet.js demo behavior). | ||
| createEffect(() => { | ||
| const backendMode = appStore.modelBackendMode(); | ||
| if (backendMode.startsWith('webgpu')) { | ||
| if (appStore.encoderQuant() !== 'fp32') appStore.setEncoderQuant('fp32'); | ||
| if (appStore.decoderQuant() !== 'int8') appStore.setDecoderQuant('int8'); | ||
| } else { | ||
| if (appStore.encoderQuant() !== 'int8') appStore.setEncoderQuant('int8'); | ||
| if (appStore.decoderQuant() !== 'int8') appStore.setDecoderQuant('int8'); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Reactive effect overrides user-selected quantization values.
In SolidJS, this effect tracks modelBackendMode(), encoderQuant(), and decoderQuant() as dependencies. If a user manually changes encoder quantization via the UI (e.g., selects int8 while in webgpu-hybrid mode), the effect re-fires and immediately reverts it to fp32. The quantization selects in SettingsPanel.tsx (lines 131–154) are only disabled during loading — they should also be disabled (or hidden) when the effect is controlling them, to avoid confusing UX where user selections are silently overridden.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/App.tsx` around lines 326 - 336, The reactive createEffect in App.tsx
currently forces encoder/decoder quant values based only on
appStore.modelBackendMode(), which causes user selections
(appStore.encoderQuant()/appStore.decoderQuant()) to be overwritten; change the
logic so the effect only enforces presets when a new backend mode is selected
AND the user isn’t actively controlling the quant settings (e.g., add a flag in
appStore like isQuantAutoControlled or check a new store field quantControlMode
=== 'auto'|'manual'), and update the SettingsPanel quant selects to disable/hide
when the effect is controlling them; specifically adjust the createEffect that
calls appStore.setEncoderQuant and setDecoderQuant to respect the new control
flag and update SettingsPanel.tsx (the encoder/decoder select controls) to
reflect that flag so users aren’t silently overridden.
| <div class="grid grid-cols-2 gap-x-4 gap-y-3 pt-1"> | ||
| <div class="space-y-1"> | ||
| <span class="text-[10px] font-bold uppercase tracking-widest text-[var(--color-earthy-soft-brown)]">Backend</span> | ||
| <select | ||
| class="w-full text-sm bg-transparent border-b border-[var(--color-earthy-sage)]/40 px-0 py-1.5 text-[var(--color-earthy-dark-brown)] focus:outline-none focus:border-[var(--color-earthy-muted-green)]" | ||
| value={appStore.modelBackendMode()} | ||
| onInput={(e) => appStore.setModelBackendMode((e.target as HTMLSelectElement).value as 'webgpu-hybrid' | 'wasm')} | ||
| disabled={appStore.modelState() === 'loading'} | ||
| > | ||
| <option value="webgpu-hybrid">WebGPU</option> | ||
| <option value="wasm">WASM</option> | ||
| </select> | ||
| </div> | ||
| <div class="space-y-1"> | ||
| <span class="text-[10px] font-bold uppercase tracking-widest text-[var(--color-earthy-soft-brown)]">Stride</span> | ||
| <input | ||
| type="number" | ||
| min="1" | ||
| max="4" | ||
| step="1" | ||
| value={appStore.frameStride()} | ||
| onInput={(e) => { | ||
| const next = Number((e.target as HTMLInputElement).value); | ||
| if (Number.isFinite(next)) appStore.setFrameStride(Math.max(1, Math.min(4, Math.round(next)))); | ||
| }} | ||
| class="w-full text-sm bg-transparent border-b border-[var(--color-earthy-sage)]/40 px-0 py-1.5 text-[var(--color-earthy-dark-brown)] focus:outline-none focus:border-[var(--color-earthy-muted-green)]" | ||
| /> | ||
| </div> | ||
| <div class="space-y-1"> | ||
| <span class="text-[10px] font-bold uppercase tracking-widest text-[var(--color-earthy-soft-brown)]">Encoder</span> | ||
| <select | ||
| class="w-full text-sm bg-transparent border-b border-[var(--color-earthy-sage)]/40 px-0 py-1.5 text-[var(--color-earthy-dark-brown)] focus:outline-none focus:border-[var(--color-earthy-muted-green)]" | ||
| value={appStore.encoderQuant()} | ||
| onInput={(e) => appStore.setEncoderQuant((e.target as HTMLSelectElement).value as 'int8' | 'fp32')} | ||
| disabled={appStore.modelState() === 'loading'} | ||
| > | ||
| <option value="fp32">fp32</option> | ||
| <option value="int8">int8</option> | ||
| </select> | ||
| </div> | ||
| <div class="space-y-1"> | ||
| <span class="text-[10px] font-bold uppercase tracking-widest text-[var(--color-earthy-soft-brown)]">Decoder</span> | ||
| <select | ||
| class="w-full text-sm bg-transparent border-b border-[var(--color-earthy-sage)]/40 px-0 py-1.5 text-[var(--color-earthy-dark-brown)] focus:outline-none focus:border-[var(--color-earthy-muted-green)]" | ||
| value={appStore.decoderQuant()} | ||
| onInput={(e) => appStore.setDecoderQuant((e.target as HTMLSelectElement).value as 'int8' | 'fp32')} | ||
| disabled={appStore.modelState() === 'loading'} | ||
| > | ||
| <option value="int8">int8</option> | ||
| <option value="fp32">fp32</option> | ||
| </select> | ||
| </div> | ||
| </div> |
There was a problem hiding this comment.
New configuration grid is well-structured.
Backend, Stride, Encoder, and Decoder controls are correctly wired to the app store. One note: since the reactive effect in App.tsx overrides encoder/decoder quant based on backend mode, consider adding visual indication (e.g., disabled or a tooltip) when quantization values are auto-managed by the backend selection — otherwise users may be confused when their selection reverts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/components/SettingsPanel.tsx` around lines 103 - 155, The encoder/decoder
selects can be changed by the user but are later overridden by the reactive
effect in App.tsx when certain backend modes are selected; update the UI to
reflect that by disabling the Encoder and Decoder <select> controls (the ones
bound to appStore.encoderQuant() and appStore.decoderQuant()) whenever
appStore.modelBackendMode() indicates the backend auto-manages quantization, and
add a short tooltip/title like "Managed by backend selection" to those disabled
controls; keep existing disabled behavior tied to appStore.modelState() ===
'loading' and ensure you don't remove the onInput handlers
(setEncoderQuant/setDecoderQuant) so they remain usable when not auto-managed.
|
Addressed review feedback in
Not applied:
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
feat: add wasm thread control and runtime backend/quantization/stride settings 009d994
Summary
Notes
Summary by Sourcery
Introduce user-configurable model runtime controls for backend, quantization, decoder stride, and WASM thread count, and plumb these settings through the UI, persistence, worker, and model loading pipeline.
New Features:
Enhancements:
Build:
Tests:
Summary by CodeRabbit
New Features
Chores