Generate speech with timestamps
Synthesizes speech and returns a JSON envelope with base64 audio and word-level timestamps. Pass either `voiceId` (to use a saved Voice) or `model` + `voice` (inline) — not both.
Synthesizes speech and returns a JSON envelope with base64 audio and word-level timestamps. Pass either voiceId (to use a saved Voice) or model + voice (inline) — not both.
Authorization
bearerAuth API key
In: header
Request Body
application/json
TypeScript Definitions
Use the request body type in TypeScript.
Response Body
application/json
application/problem+json
application/problem+json
application/json
curl -X POST "https://example.com/v1/audio/speech/with-timestamps" \ -H "Authorization: Bearer $SPEECHBASE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "voiceId": "550e8400-e29b-41d4-a716-446655440000", "text": "Hello from a saved voice."}'{
"audio": "string",
"mediaType": "string",
"warnings": [
"string"
],
"timestamps": [
{
"text": "string",
"start": 0,
"end": 0
}
]
}{
"type": "string",
"title": "string",
"status": 0,
"detail": "string",
"code": "string",
"validation": [
{
"path": [
"string"
],
"message": "string"
}
],
"provider": "string",
"upstream_code": "string",
"upstream_status": 0,
"turn_index": 0
}{
"type": "string",
"title": "string",
"status": 0,
"detail": "string",
"code": "string",
"validation": [
{
"path": [
"string"
],
"message": "string"
}
],
"provider": "string",
"upstream_code": "string",
"upstream_status": 0,
"turn_index": 0
}{
"error": {
"code": "content_moderation_blocked",
"message": "string",
"reason": {
"type": "error_fail_closed"
}
}
}Stream speech POST
Streams audio from the provider for low latency (provider pass-through). Pass either `voiceId` (to use a saved Voice) or `model` + `voice` (inline), not both. Whole-clip params (`volumeDbfs`, `output` format conversion) are not accepted here. Use POST /v1/audio/speech for those.
Generate conversation POST
Synthesizes a multi-turn conversation into a single mixed audio file and returns the raw audio bytes. Set `split: true` to instead receive a JSON envelope with an `audioSegments` array, one entry per turn.

