Generate conversation
Synthesizes a multi-turn conversation into a single mixed audio file and returns the raw audio bytes. Set `split: true` to instead receive a JSON envelope with an `audioSegments` array, one entry per turn.
Synthesizes a multi-turn conversation into a single mixed audio file and returns the raw audio bytes. Set split: true to instead receive a JSON envelope with an audioSegments array, one entry per turn.
Authorization
bearerAuth API key
In: header
Request Body
application/json
TypeScript Definitions
Use the request body type in TypeScript.
Response Body
application/problem+json
application/problem+json
application/json
curl -X POST "https://example.com/v1/audio/conversation" \ -H "Authorization: Bearer $SPEECHBASE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o-mini-tts", "turns": [ { "voice": "alloy", "text": "How was your weekend?" }, { "voice": "shimmer", "text": "Good, I finally caught up on sleep." } ], "gapMs": 500, "output": "mp3"}'{
"audioSegments": [
{
"audio": "string",
"mediaType": "string",
"durationMs": 0,
"turnIndex": 0
}
],
"warnings": [
"string"
]
}{
"type": "string",
"title": "string",
"status": 0,
"detail": "string",
"code": "string",
"validation": [
{
"path": [
"string"
],
"message": "string"
}
],
"provider": "string",
"upstream_code": "string",
"upstream_status": 0,
"turn_index": 0
}{
"type": "string",
"title": "string",
"status": 0,
"detail": "string",
"code": "string",
"validation": [
{
"path": [
"string"
],
"message": "string"
}
],
"provider": "string",
"upstream_code": "string",
"upstream_status": 0,
"turn_index": 0
}{
"error": {
"code": "content_moderation_blocked",
"message": "string",
"reason": {
"type": "error_fail_closed"
}
}
}Generate speech with timestamps POST
Synthesizes speech and returns a JSON envelope with base64 audio and word-level timestamps. Pass either `voiceId` (to use a saved Voice) or `model` + `voice` (inline) — not both.
Generate conversation with timestamps POST
Synthesizes a multi-turn conversation and returns a JSON envelope with base64 audio and word-level timestamps mapped back to each originating turn. Set `split: true` to instead receive an `audioSegments` array, one entry per turn, each with block-relative timestamps.

