Providers
The catalogue of upstream TTS providers Speechbase routes to, and the workspace state attached to each provider.
A provider is an upstream TTS vendor: OpenAI, ElevenLabs, Cartesia, Hume, Google, Deepgram, Inworld, Fish Audio, Murf, Resemble, fal, Mistral, xAI, and others as the catalog grows. Speechbase ships an integration with each provider and exposes their models through one API.
For the end-to-end routing model, including BYOK and Managed Routing, start with Providers and routing.
How a synthesis call gets to a provider
Every inline synthesis request specifies a model string of the form
<provider_id>/<model_id>, e.g. openai/gpt-4o-mini-tts or
elevenlabs/eleven_v3. Speechbase reads the prefix, looks up the provider
integration, resolves provider access for your workspace, and dispatches the
call.
If you didn't pin a provider in the request — for instance because you passed a
voice_id that already encodes one — Speechbase uses the provider that the voice
was registered against.
Listing what's available
GET /v1/audio/providers returns the full catalog with three pieces of
state per provider:
enabled— whether the provider is currently switched on for your org. You can toggle this in the dashboard at Speechbase → Model Providers.byok— whether you've stored a key for this provider yet. In BYOK mode, synthesis calls targeting a provider without a key fail withno_api_key.models— the prefixed model IDs you can pass in a synthesis request.
Provider access
Speechbase supports BYOK for self-serve provider access and Managed Routing for workspaces where Speechbase manages provider relationships, billing, and quotas.
With BYOK, your provider key talks to the provider directly; the provider bill arrives at the provider, not at Speechbase.
Mechanically: when you store a key via PUT /v1/api-keys/{providerId},
Speechbase stores it encrypted in a secure key store and writes a metadata row
recording the last four characters and key_updated_at. We can't view or
recover the full key. The plaintext key never lives on disk and is never logged.
At request time the gateway decrypts the key in-memory, instantiates the
provider client, and discards the plaintext when the request completes.
To rotate, just PUT the new value over the old one. To stop using a provider
entirely, DELETE /v1/api-keys/{providerId} — both the encrypted key and the
metadata row are cleared atomically.
For the step-by-step setup flow, see BYOK guide. For the comparison between BYOK and Managed Routing, see Providers and routing.
Fallbacks and timestamps
Not every provider exposes word-level alignment natively. Speechbase wraps each
provider with a fallback STT pass (currently OpenAI Whisper) for the
with-timestamps endpoints, so any provider can produce timestamps even when
the upstream API doesn't. This is transparent — you don't choose; the gateway
picks the best available source per request and only falls back when needed.
If both native alignment and the STT fallback fail, the
/v1/audio/speech/with-timestamps endpoint returns 503 timestamps_unavailable.
The conversation variant is more forgiving — it can return audio with an empty
timestamps array and a warnings entry instead, so you still get the audio
to play. See Word-level timestamps.
Multi-provider conversations
A single conversation request can dispatch different turns to different providers. Speechbase validates that every provider referenced in the turn list has a stored key and is enabled before any call is made — partial dispatches don't happen. See Conversations.