Skip to content

EkaScribe Architecture Changes#60

Open
divyesh11 wants to merge 3 commits intomainfrom
divyesh/ekascribe_architecture_changes
Open

EkaScribe Architecture Changes#60
divyesh11 wants to merge 3 commits intomainfrom
divyesh/ekascribe_architecture_changes

Conversation

@divyesh11
Copy link
Collaborator

feat: Integrate Silero VAD for voice activity detection

This commit integrates the Silero VAD (Voice Activity Detection) library to provide more robust speech detection capabilities.

  • Dependency Integration:

    • Added the com.konovalov.vad:silero-vad dependency (libs.silero) to the project.
  • VAD Implementation:

    • VADAnalyserImpl is updated to use the Silero Vad engine.
    • The implementation now uses the vad.isSpeech() method to determine if incoming audio frames contain speech.
    • It is configurable for VAD mode, sample rate, and frame size, with sensible defaults provided.

This commit introduces a new Android library module, `ekascribe_sdk`, establishing the foundational structure for audio processing and analysis.

### Key Changes:

-   **Module Scaffolding:**
    -   Added the `ekascribe_sdk` Android library module.
    -   Created the basic directory structure, including `build.gradle.kts`, `AndroidManifest.xml`, and Proguard rules.
    -   Included the new module in `settings.gradle.kts`.

-   **Build & Dependency Configuration:**
    -   Configured `build.gradle.kts` for the new module, setting up `compileSdk`, `minSdk`, and Java 17 compatibility.
    -   Added initial dependencies for `androidx.core.ktx`, `androidx.appcompat`, and `gson`.

-   **Initial Audio & Manager Classes:**
    -   Introduced core data models for audio processing: `AudioData`, `AudioSampleRate`, and `AudioFrameSize`.
    -   Created placeholder classes for key components:
        -   `VADAnalyserImpl` and its interface `VoiceActivityAnalyser` for voice activity detection.
        -   `AudioDataManager` and `SessionManager` as singletons for future state management.
This commit integrates the Silero VAD (Voice Activity Detection) library to provide more robust speech detection capabilities.

- **Dependency Integration:**
  - Added the `com.konovalov.vad:silero-vad` dependency (`libs.silero`) to the project.

- **VAD Implementation:**
  - `VADAnalyserImpl` is updated to use the Silero `Vad` engine.
  - The implementation now uses the `vad.isSpeech()` method to determine if incoming audio frames contain speech.
  - It is configurable for VAD mode, sample rate, and frame size, with sensible defaults provided.
@divyesh11 divyesh11 self-assigned this Nov 5, 2025
This commit refactors the audio analysis components by introducing a common interface and preparing for the integration of new analysis models.

### Key Changes:

-   **`VoiceActivityAnalyser` Interface:**
    -   A new interface, `VoiceActivityAnalyser`, is introduced to abstract the audio analysis logic. It defines a single method, `analyseAudioData`, for processing `AudioData`.

-   **Refactored VAD Implementation:**
    -   `VADAnalyserImpl` is updated to implement the new `VoiceActivityAnalyser` interface.
    -   The `VadSilero` instance is now lazily initialized on its first use, ensuring it is only created when needed.

-   **New `AudioQualityAnalyserImpl`:**
    -   A new placeholder class, `AudioQualityAnalyserImpl`, has been created.
    -   It also implements the `VoiceActivityAnalyser` interface, paving the way for future audio quality analysis features.

-   **Dependency Updates:**
    -   The ONNX Runtime dependency (`onnxruntime-android`) has been added to support upcoming machine learning model integrations.
    -   The Silero VAD dependency (`libs.silero`) has been changed from `api` to `implementation` to better encapsulate it within the SDK.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant