Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions docs/speech-to-text/batch/input.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,6 @@ import batchSchema from "!openapi-schema-loader!@site/spec/batch.yaml";

# Input

:::info
This page documents audio inputs for transcription by **REST API** (a.k.a. Batch SaaS).
* For Realtime transcription, see the [Realtime Transcription input](/speech-to-text/realtime/input).
* For Flow Voice AI, see the [Flow Voice AI supported formats and limits](/voice-agents/flow/supported-formats-and-limits).
:::

## Supported file types

The following file formats types are supported for transcription by REST API:
Expand Down
2 changes: 1 addition & 1 deletion docs/speech-to-text/batch/limits.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: 'Learn about rate limiting and usage limits for the Speechmatics Ba

import HTTPMethodBadge from '@theme/HTTPMethodBadge'

# Limits – Batch transcription
# Limits – Batch

## Rate limiting and fair usage

Expand Down
18 changes: 8 additions & 10 deletions docs/speech-to-text/realtime/input.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,12 @@ import realtimeSchema from "!asyncapi-schema-loader!@site/spec/realtime.yaml"

# Input

:::info
This page is about the **Realtime transcription API** (websocket).
* For information on Batch SaaS, see the [Batch SaaS input](/speech-to-text/batch/input).
* For information on Flow Voice AI, see the [Flow Voice AI input](/voice-agents/flow/supported-formats-and-limits).
:::

## Supported input audio formats

Sessions can be configured to use two types of audio input, `file` and `raw`. Unless you have a specific reason to use the `file` option, we recommend using the `raw` option.

Sessions can be configured to use two types of audio input: `file` and `raw`.
We recommend using the `raw` option, unless you have a specific reason to use the `file` option.

:::tip
For capturing raw audio in the browser, try our `browser-audio-input` package, [available here on NPM](https://www.npmjs.com/package/@speechmatics/browser-audio-input).
For capturing raw audio in the browser, try our [`browser-audio-input` package](https://www.npmjs.com/package/@speechmatics/browser-audio-input).
:::

### `audio_format`
Expand All @@ -36,3 +29,8 @@ The format must be supplied in the `audio_format` field of the `StartRecognition

After receiving a `RecognitionStarted` message, you can start sending audio over the Websocket connection. Audio is sent as binary data, encoded in the format specified in the `StartRecognition` message. See [Protocol overview](/api-ref/realtime-transcription-websocket#protocol-overview) for complete details of the API protocol.

## Next steps

View our guides:
- [using a microphone](docs/speech-to-text/realtime/guides/python-using-microphone.mdx) to learn how to capture audio from a microphone.
- [using FFMPEG](docs/speech-to-text/realtime/guides/python-using-ffmpeg.mdx) to find out how to pipe microphone audio to the API.
2 changes: 1 addition & 1 deletion docs/speech-to-text/realtime/limits.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: 'Learn about the limits for the Speechmatics Realtime API'

import HTTPMethodBadge from '@theme/HTTPMethodBadge'

# Limits – Realtime transcription
# Limits – Realtime

Speechmatics limits the number of hours of audio users can process each month to help manage load on our servers. The current limits (in hours) by account type are listed in the table below:

Expand Down
4 changes: 2 additions & 2 deletions docs/speech-to-text/realtime/turn-detection.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ Use the end of utterance feature to help with turn detection in real-time conver

## Use cases

**Voice AI & conversational systems**: Enable voice assistants and chatbots to detect when the user has finished speaking, allowing the system to respond promptly without awkward delays.
**Voice AI and conversational systems**: Enable voice assistants and chatbots to detect when the user has finished speaking, allowing the system to respond promptly without awkward delays.

**Realtime translation**: Critical for live interpretation services where translations need to be delivered as soon as the speaker completes their thought, maintaining the flow of conversation.

**Dictation & transcription**: Helps dictation software determine when users have completed their input, improving speed of final transcription and user experience.
**Dictation and transcription**: Helps dictation software determine when users have completed their input, improving speed of final transcription and user experience.


## End of utterance
Expand Down