Providers
Cartesia
Cartesia Sonic text-to-speech through the Speechbase gateway, with audio tags, voice cloning, and native timestamps on sonic-3.
| Prefix | cartesia |
| Default model | sonic-3 |
| Provider key | Connect under Provider Keys |
Route to Cartesia by prefixing the model with cartesia/. sonic-3 is the
flagship — fast, multilingual, and the model that supports audio tags and
cloning.
Models
| Model | Streaming | Audio tags | Voice cloning | Timestamps | Languages |
|---|---|---|---|---|---|
sonic-3 | Yes | Yes | Yes | Native | 42 |
sonic-2 | Yes | — | — | Native | 1 |
Usage
curl -X POST https://api.speechbase.ai/v1/audio/speech \
-H "Authorization: Bearer $SPEECHBASE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"mode": "inline",
"model": "cartesia/sonic-3",
"voice": "a0e99841-438c-4a64-b679-ae501e7d6091",
"text": "Hello from Speechbase!",
"output": "mp3"
}' --output hello.mp3voice is a Cartesia voice UUID.
Voice cloning
sonic-3 supports voice cloning. Clone a reference into a saved
Voice, then address it by voiceId with
mode: "voice" — inline requests take a plain voice string only.
Provider options
Anything in providerOptions is forwarded to the Cartesia API unchanged
(for example speed or emotion controls).

