Stream speech
Streams audio from the provider for low latency (provider pass-through). Pass either `voiceId` (to use a saved Voice) or `model` + `voice` (inline), not both. Whole-clip params (`volumeDbfs`, `output` format conversion) are not accepted here. Use POST /v1/audio/speech for those.
Streams audio from the provider for low latency (provider pass-through). Pass either voiceId (to use a saved Voice) or model + voice (inline), not both. Whole-clip params (volumeDbfs, output format conversion) are not accepted here. Use POST /v1/audio/speech for those.
Authorization
bearerAuth API key
In: header
Request Body
application/json
TypeScript Definitions
Use the request body type in TypeScript.
Response Body
application/problem+json
application/problem+json
application/json
curl -X POST "https://example.com/v1/audio/speech/stream" \ -H "Authorization: Bearer $SPEECHBASE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "voiceId": "550e8400-e29b-41d4-a716-446655440000", "text": "Hello from a saved voice."}'"string"{
"type": "string",
"title": "string",
"status": 0,
"detail": "string",
"code": "string",
"validation": [
{
"path": [
"string"
],
"message": "string"
}
],
"provider": "string",
"upstream_code": "string",
"upstream_status": 0,
"turn_index": 0
}{
"type": "string",
"title": "string",
"status": 0,
"detail": "string",
"code": "string",
"validation": [
{
"path": [
"string"
],
"message": "string"
}
],
"provider": "string",
"upstream_code": "string",
"upstream_status": 0,
"turn_index": 0
}{
"error": {
"code": "content_moderation_blocked",
"message": "string",
"reason": {
"type": "error_fail_closed"
}
}
}Generate speech POST
Synthesizes the whole clip and returns raw audio bytes in a single response. Pass either `voiceId` (to use a saved Voice) or `model` + `voice` (inline), not both. Because the full clip is produced server-side, whole-clip params (`volumeDbfs`, `output` format conversion) are applied here. For low-latency provider streaming, use POST /v1/audio/speech/stream.
Generate speech with timestamps POST
Synthesizes speech and returns a JSON envelope with base64 audio and word-level timestamps. Pass either `voiceId` (to use a saved Voice) or `model` + `voice` (inline) — not both.

