Generate speech with timestamps

Synthesizes speech and returns a JSON envelope with base64 audio and word-level timestamps. Pass either `voiceId` (to use a saved Voice) or `model` + `voice` (inline) — not both.

Synthesizes speech and returns a JSON envelope with base64 audio and word-level timestamps. Pass either voiceId (to use a saved Voice) or model + voice (inline) — not both.

Authorization

bearerAuth

AuthorizationBearer <token>

API key

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

voiceId*string

Speechbase Voice UUID. The gateway resolves this to the underlying provider/model/voice at request time. Pass this OR (model + voice), never both.

Match^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i

text*string

Length1 <= length

providerOptions?

pronunciations?

moderation_ruleset_id?string

Match^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i

volumeDbfs?number

Target peak level in dBFS (negative = quieter). Defaults to -16 dBFS on the buffered POST /v1/audio/speech and /v1/audio/speech/with-timestamps when omitted; not accepted on POST /v1/audio/speech/stream.

Default-16

Rangevalue <= 0

output?||

Output container format. Accepts a string shorthand ("wav" | "mp3" | "pcm") or an object ({ format, bitrate? }); bitrate is only valid with mp3 and defaults to 96 kbps when omitted. Format conversion is a whole-clip operation, so it applies on the buffered POST /v1/audio/speech (Response Content-Type reflects the chosen format) and is not accepted on POST /v1/audio/speech/stream. When omitted the provider's native format is returned.

speed?number

Playback rate multiplier (1.0 = normal). Range 0.75 to 1.5. Applied on the buffered POST /v1/audio/speech and POST /v1/audio/speech/with-timestamps; not accepted on POST /v1/audio/speech/stream.

Range0.75 <= value <= 1.5

enhance?boolean

Currently a no-op: accepted for backward compatibility but has no effect on the returned audio. Studio-sound post-processing has been removed. Defaults to false.

timestamps?string

Whether to return word-level timestamps. 'on' returns timestamps (default), using native provider timing when available and gateway timestamp fallback otherwise. 'off' skips timestamp generation entirely.

Value in"on" | "off"

model*string

Provider/model in "/" form. Required for inline calls. Pass this with voice, OR pass voiceId alone — never both.

Match^[a-z0-9-]+\/[a-zA-Z0-9._-]+$

voice*string

Provider-native voice identifier (e.g. an ElevenLabs voice ID). Pass with model.

Length1 <= length

text*string

Length1 <= length

providerOptions?

pronunciations?

moderation_ruleset_id?string

Match^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i

volumeDbfs?number

Default-16

Rangevalue <= 0

output?||

speed?number

Playback rate multiplier (1.0 = normal). Range 0.75 to 1.5. Applied on the buffered POST /v1/audio/speech and POST /v1/audio/speech/with-timestamps; not accepted on POST /v1/audio/speech/stream.

Range0.75 <= value <= 1.5

enhance?boolean

Currently a no-op: accepted for backward compatibility but has no effect on the returned audio. Studio-sound post-processing has been removed. Defaults to false.

timestamps?string

Value in"on" | "off"

Response Body

`application/json`

`application/problem+json`

`application/json`

curl -X POST "https://example.com/v1/audio/speech/with-timestamps" \  -H "Authorization: Bearer $SPEECHBASE_API_KEY" \  -H "Content-Type: application/json" \  -d '{  "voiceId": "550e8400-e29b-41d4-a716-446655440000",  "text": "Hello from a saved voice."}'

{
  "audio": "string",
  "mediaType": "string",
  "warnings": [
    "string"
  ],
  "timestamps": [
    {
      "text": "string",
      "start": 0,
      "end": 0
    }
  ]
}

{
  "type": "string",
  "title": "string",
  "status": 0,
  "detail": "string",
  "code": "string",
  "validation": [
    {
      "path": [
        "string"
      ],
      "message": "string"
    }
  ],
  "provider": "string",
  "upstream_code": "string",
  "upstream_status": 0,
  "turn_index": 0
}

{
  "type": "string",
  "title": "string",
  "status": 0,
  "detail": "string",
  "code": "string",
  "validation": [
    {
      "path": [
        "string"
      ],
      "message": "string"
    }
  ],
  "provider": "string",
  "upstream_code": "string",
  "upstream_status": 0,
  "turn_index": 0
}

{
  "error": {
    "code": "content_moderation_blocked",
    "message": "string",
    "reason": {
      "type": "error_fail_closed"
    }
  }
}

Generate speech with timestamps

Authorization

Request Body

Response Body

200application/json

400application/problem+json

401application/problem+json

422application/json

`application/json`

`application/problem+json`

`application/problem+json`

`application/json`