Generate conversation with timestamps

Synthesizes a multi-turn conversation and returns a JSON envelope with base64 audio and word-level timestamps mapped back to each originating turn. Set `split: true` to instead receive an `audioSegments` array, one entry per turn, each with block-relative timestamps.

Synthesizes a multi-turn conversation and returns a JSON envelope with base64 audio and word-level timestamps mapped back to each originating turn. Set split: true to instead receive an audioSegments array, one entry per turn, each with block-relative timestamps.

Authorization

bearerAuth

AuthorizationBearer <token>

API key

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

model?string

Provider/model used by every flat turn that doesn't supply its own model. Set this OR per-turn model — not both.

Match^[a-z0-9-]+\/[a-zA-Z0-9._-]+$

gapMs?integer

Silence inserted between consecutive turns, in milliseconds. Defaults to 0 (no gap).

Range0 <= value

volumeDbfs?number

Target peak loudness in dBFS (negative; e.g. -16). Each turn is normalized to this level before stitching. Defaults to -16 dBFS when omitted.

Default-16

Rangevalue <= 0

providerOptions?

Free-form passthrough applied to every turn's upstream provider call. Per-turn providerOptions override these.

output?||

Output container format. Accepts either a string shorthand ("wav" | "mp3" | "pcm") or an object ({ format, bitrate? }); bitrate is only valid with mp3 and defaults to 96 kbps when omitted. Defaults to the provider's native stitched format (typically wav). Note: pcm is headerless raw audio — consumers must know the sample rate out-of-band. Response Content-Type reflects the chosen format.

timestamps?string

Whether to return word-level timestamps. 'on' returns timestamps (default), 'off' skips timestamp generation entirely.

Value in"on" | "off"

pronunciations?

Pronunciation overrides applied to every turn's text before synthesis: saved dictionary IDs and/or inline rules.

moderation_ruleset_id?string

UUID of the org moderation ruleset to evaluate this request against. When omitted, the org's default ruleset applies.

Match^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i

speed?number

Default playback rate for every turn that doesn't override it. Range 0.75–1.5. Per-turn speed wins.

Range0.75 <= value <= 1.5

split?boolean

Return each turn as its own audio block instead of one stitched file. When true the response is a JSON envelope with a blocks array (one entry per turn, in order). Defaults to false.

Defaultfalse

enhance?boolean

Currently a no-op: accepted for backward compatibility but has no effect on the returned audio. Studio-sound post-processing has been removed. Defaults to false.

turns*array<|>

Ordered list of speaker turns to synthesize and stitch into one audio file. Must contain at least one turn.

Items1 <= items

Response Body

`application/json`

`application/problem+json`

`application/json`

curl -X POST "https://example.com/v1/audio/conversation/with-timestamps" \  -H "Authorization: Bearer $SPEECHBASE_API_KEY" \  -H "Content-Type: application/json" \  -d '{  "model": "openai/gpt-4o-mini-tts",  "turns": [    {      "voice": "alloy",      "text": "How was your weekend?"    },    {      "voice": "shimmer",      "text": "Good, I finally caught up on sleep."    }  ],  "gapMs": 500,  "output": "mp3",  "timestamps": "on"}'

{
  "audio": "string",
  "mediaType": "string",
  "warnings": [
    "string"
  ],
  "timestamps": [
    {
      "text": "string",
      "start": 0,
      "end": 0,
      "turnIndex": 0
    }
  ]
}

{
  "type": "string",
  "title": "string",
  "status": 0,
  "detail": "string",
  "code": "string",
  "validation": [
    {
      "path": [
        "string"
      ],
      "message": "string"
    }
  ],
  "provider": "string",
  "upstream_code": "string",
  "upstream_status": 0,
  "turn_index": 0
}

{
  "type": "string",
  "title": "string",
  "status": 0,
  "detail": "string",
  "code": "string",
  "validation": [
    {
      "path": [
        "string"
      ],
      "message": "string"
    }
  ],
  "provider": "string",
  "upstream_code": "string",
  "upstream_status": 0,
  "turn_index": 0
}

{
  "error": {
    "code": "content_moderation_blocked",
    "message": "string",
    "reason": {
      "type": "error_fail_closed"
    }
  }
}

Generate conversation with timestamps

Authorization

Request Body

Response Body

200application/json

400application/problem+json

401application/problem+json

422application/json

`application/json`

`application/problem+json`

`application/problem+json`

`application/json`