Speechbase: Universal Text-to-Speech Gateway & Voice Management

Cartesia Sonic text-to-speech through the Speechbase gateway, with audio tags, voice cloning, and native timestamps on sonic-3.


Prefix	`cartesia`
Default model	`sonic-3`
Provider key	Connect under Provider Keys

Route to Cartesia by prefixing the model with cartesia/. sonic-3 is the flagship — fast, multilingual, and the model that supports audio tags and cloning.

Models

Model	Streaming	Audio tags	Voice cloning	Timestamps	Languages
`sonic-3`	Yes	Yes	Yes	Native	42
`sonic-2`	Yes	—	—	Native	1

Usage

curl -X POST https://api.speechbase.ai/v1/audio/speech \
  -H "Authorization: Bearer $SPEECHBASE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "inline",
    "model": "cartesia/sonic-3",
    "voice": "a0e99841-438c-4a64-b679-ae501e7d6091",
    "text": "Hello from Speechbase!",
    "output": "mp3"
  }' --output hello.mp3

voice is a Cartesia voice UUID.

Voice cloning

sonic-3 supports voice cloning. Clone a reference into a saved Voice, then address it by voiceId with mode: "voice" — inline requests take a plain voice string only.

Provider options

Anything in providerOptions is forwarded to the Cartesia API unchanged (for example speed or emotion controls).

Cartesia

Models

Usage

Voice cloning

Provider options

On this page