Skip to content

Speechall/speechall-python-sdk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speechall Python SDK

Python SDK for the Speechall API - A powerful speech-to-text transcription service supporting multiple AI models and providers.

PyPI version Python 3.8+

Features

  • Multiple AI Models: Access various speech-to-text models from different providers (OpenAI Whisper, and more)
  • Flexible Input: Transcribe local audio files or remote URLs
  • Rich Output Formats: Get results in text, JSON, SRT, or VTT formats
  • Speaker Diarization: Identify and separate different speakers in audio
  • Custom Vocabulary: Improve accuracy with domain-specific terms
  • Replacement Rules: Apply custom text transformations to transcriptions
  • Language Support: Auto-detect languages or specify from a wide range of supported languages
  • Async Support: Built with async/await support using httpx

Installation

pip install speechall

Quick Start

Basic Transcription

import os
from speechall import SpeechallApi

# Initialize the client
client = SpeechallApi(token=os.getenv("SPEECHALL_API_TOKEN"))

# Transcribe a local audio file
with open("audio.mp3", "rb") as audio_file:
    audio_data = audio_file.read()

response = client.speech_to_text.transcribe(
    model="openai.whisper-1",
    request=audio_data,
    language="en",
    output_format="json",
    punctuation=True
)

print(response.text)

Transcribe Remote Audio

from speechall import SpeechallApi

client = SpeechallApi(token=os.getenv("SPEECHALL_API_TOKEN"))

response = client.speech_to_text.transcribe_remote(
    file_url="https://example.com/audio.mp3",
    model="openai.whisper-1",
    language="auto",  # Auto-detect language
    output_format="json"
)

print(response.text)

Advanced Features

Speaker Diarization

Identify different speakers in your audio:

response = client.speech_to_text.transcribe(
    model="openai.whisper-1",
    request=audio_data,
    language="en",
    output_format="json",
    diarization=True,
    speakers_expected=2
)

for segment in response.segments:
    print(f"[Speaker {segment.speaker}] {segment.text}")

Custom Vocabulary

Improve accuracy for specific terms:

response = client.speech_to_text.transcribe(
    model="openai.whisper-1",
    request=audio_data,
    language="en",
    output_format="json",
    custom_vocabulary=["Kubernetes", "API", "Docker", "microservices"]
)

Replacement Rules

Apply custom text transformations:

from speechall import ReplacementRule, ExactRule

replacement_rules = [
    ReplacementRule(
        rule=ExactRule(find="API", replace="Application Programming Interface")
    )
]

response = client.speech_to_text.transcribe_remote(
    file_url="https://example.com/audio.mp3",
    model="openai.whisper-1",
    language="en",
    output_format="json",
    replacement_ruleset=replacement_rules
)

List Available Models

models = client.speech_to_text.list_speech_to_text_models()

for model in models:
    print(f"{model.model_identifier}: {model.display_name}")
    print(f"  Provider: {model.provider}")

Configuration

Authentication

Get your API token from speechall.com and set it as an environment variable:

export SPEECHALL_API_TOKEN="your-token-here"

Or pass it directly when initializing the client:

from speechall import SpeechallApi

client = SpeechallApi(token="your-token-here")

Output Formats

  • text: Plain text transcription
  • json: JSON with detailed information (segments, timestamps, metadata)
  • json_text: JSON with simplified text output
  • srt: SubRip subtitle format
  • vtt: WebVTT subtitle format

Language Codes

Use ISO 639-1 language codes (e.g., en, es, fr, de) or auto for automatic detection.

API Reference

Client Classes

  • SpeechallApi: Main client for the Speechall API
  • AsyncSpeechallApi: Async client for the Speechall API

Main Methods

speech_to_text.transcribe()

Transcribe a local audio file.

Parameters:

  • model (str): Model identifier (e.g., "openai.whisper-1")
  • request (bytes): Audio file content
  • language (str): Language code or "auto"
  • output_format (str): Output format (text, json, srt, vtt)
  • punctuation (bool): Enable automatic punctuation
  • diarization (bool): Enable speaker identification
  • speakers_expected (int, optional): Expected number of speakers
  • custom_vocabulary (list, optional): List of custom terms
  • initial_prompt (str, optional): Context prompt for the model
  • temperature (float, optional): Model temperature (0.0-1.0)

speech_to_text.transcribe_remote()

Transcribe audio from a URL.

Parameters: Same as transcribe() but with file_url instead of request

speech_to_text.list_speech_to_text_models()

List all available models.

Examples

Check out the examples directory for more detailed usage examples:

Requirements

  • Python 3.8+
  • httpx >= 0.27.0
  • pydantic >= 2.0.0
  • typing-extensions >= 4.0.0

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Type checking
mypy .

Support

License

MIT License - see LICENSE file for details

About

Python SDK for the Speechall API - The unified speech-to-text service

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •