Providers
The catalogue of upstream TTS providers Speechbase routes to, the models each exposes, and the workspace state attached to each provider.
A provider is an upstream TTS vendor: OpenAI, ElevenLabs, Cartesia, Hume, Google, Deepgram, Inworld, MiniMax, Fish Audio, Murf, Resemble, fal, Mistral, xAI, and others as the catalog grows. Speechbase ships an integration with each provider and exposes their models through one API.
For the end-to-end routing model, including BYOK and Managed Routing, start with Providers and routing.
Browse providers
Pick a provider for its models, voices, output quirks, and per-model capabilities.
| Provider | Prefix | Default model |
|---|---|---|
| OpenAI | openai | gpt-4o-mini-tts |
| ElevenLabs | elevenlabs | eleven_multilingual_v2 |
| Deepgram | deepgram | aura-2 |
| Cartesia | cartesia | sonic-3 |
| Hume | hume | octave-2 |
google | gemini-2.5-flash-preview-tts | |
| Fish Audio | fish-audio | s2-pro |
| Inworld | inworld | inworld-tts-1.5-max |
| MiniMax | minimax | speech-2.8-hd |
| Murf | murf | GEN2 |
| Resemble | resemble | default |
| Smallest AI | smallest-ai | lightning_v3.1 |
| fal | fal-ai | (specify a model) |
| Mistral | mistral | voxtral-mini-tts-2603 |
| xAI | xai | grok-tts |
Capability matrix
| Provider | Streaming | Audio tags | Voice cloning | Timestamps | Open source |
|---|---|---|---|---|---|
| OpenAI | Yes | Yes | — | Gateway-generated | — |
| ElevenLabs | Yes | Yes (eleven_v3) | — | Native | — |
| Deepgram | Yes | — | — | Gateway-generated | — |
| Cartesia | Yes | Yes (sonic-3) | Yes (sonic-3) | Native | — |
| Hume | Yes | — | Yes (octave-2) | Native (octave-2) | — |
| Yes | Yes (gemini-3.1) | — | Gateway-generated | — | |
| Fish Audio | Yes | Yes | Yes | Gateway-generated | Yes |
| Inworld | Yes | — | — | Native | — |
| MiniMax | — | — | — | Gateway-generated | — |
| Murf | Yes | — | — | Native (GEN2) | — |
| Resemble | Yes | — | Yes | Native | Yes |
| Smallest AI | — | — | — | Gateway-generated | — |
| fal | — | — | Yes (select models) | Gateway-generated | Varies |
| Mistral | Yes | — | Yes | Gateway-generated | Yes |
| xAI | Yes | Yes | — | Gateway-generated | — |
Support is per-model — check each provider page for the breakdown. "Gateway-generated" timestamps are explained under Fallbacks and timestamps; cloning is configured through saved Voices, not inline.
How a synthesis call gets to a provider
Every inline synthesis request specifies a model string of the form
<provider_id>/<model_id>, e.g. openai/gpt-4o-mini-tts or
elevenlabs/eleven_v3. Speechbase reads the prefix, looks up the provider
integration, resolves provider access for your workspace, and dispatches the
call. The string takes exactly one slash — fal-ai/f5-tts, never a doubled
prefix.
If you didn't pin a provider in the request — for instance because you passed a
voice_id that already encodes one — Speechbase uses the provider that the voice
was registered against.
Listing what's available
GET /v1/audio/providers returns the full catalog with three pieces of
state per provider:
enabled— whether the provider is currently switched on for your org. You can toggle this in the dashboard at Speechbase → Model Providers.byok— whether you've stored a key for this provider yet. In BYOK mode, synthesis calls targeting a provider without a key fail withno_api_key.models— the prefixed model IDs you can pass in a synthesis request.
Provider access
Speechbase supports BYOK for self-serve provider access and Managed Routing for workspaces where Speechbase manages provider relationships, billing, and quotas.
With BYOK, your provider key talks to the provider directly; the provider bill arrives at the provider, not at Speechbase.
Mechanically: when you store a key via PUT /v1/api-keys/{providerId},
Speechbase stores it encrypted in a secure key store and writes a metadata row
recording the last four characters and key_updated_at. We can't view or
recover the full key. The plaintext key never lives on disk and is never logged.
At request time the gateway decrypts the key in-memory, instantiates the
provider client, and discards the plaintext when the request completes.
To rotate, just PUT the new value over the old one. To stop using a provider
entirely, DELETE /v1/api-keys/{providerId} — both the encrypted key and the
metadata row are cleared atomically.
For the step-by-step setup flow, see BYOK guide. For the comparison between BYOK and Managed Routing, see Providers and routing.
Fallbacks and timestamps
Not every provider exposes word-level timestamps natively. Speechbase wraps each
provider with a timestamp fallback pass for the with-timestamps endpoints, so
any provider can produce timestamps even when the upstream API doesn't. The
gateway aligns the rendered audio with the exact source text for that pass.
Use the provider table above to see which models currently expose native
timestamps. This is transparent — you don't choose; the gateway uses native
timestamps when available and timestamp fallback when needed.
If both native timestamps and the timestamp fallback pass fail, the
/v1/audio/speech/with-timestamps endpoint returns 503 timestamps_unavailable.
The conversation variant is more forgiving — it can return audio with an empty
timestamps array and a warnings entry instead, so you still get the audio
to play. See Word-level timestamps.
Multi-provider conversations
A single conversation request can dispatch different turns to different providers. Speechbase validates that every provider referenced in the turn list has a stored key and is enabled before any call is made — partial dispatches don't happen. See Conversations.

