Speechbase

Providers

The catalogue of upstream TTS providers Speechbase routes to, and the workspace state attached to each provider.

A provider is an upstream TTS vendor: OpenAI, ElevenLabs, Cartesia, Hume, Google, Deepgram, Inworld, Fish Audio, Murf, Resemble, fal, Mistral, xAI, and others as the catalog grows. Speechbase ships an integration with each provider and exposes their models through one API.

For the end-to-end routing model, including BYOK and Managed Routing, start with Providers and routing.

How a synthesis call gets to a provider

Every inline synthesis request specifies a model string of the form <provider_id>/<model_id>, e.g. openai/gpt-4o-mini-tts or elevenlabs/eleven_v3. Speechbase reads the prefix, looks up the provider integration, resolves provider access for your workspace, and dispatches the call.

If you didn't pin a provider in the request — for instance because you passed a voice_id that already encodes one — Speechbase uses the provider that the voice was registered against.

Listing what's available

GET /v1/audio/providers returns the full catalog with three pieces of state per provider:

  • enabled — whether the provider is currently switched on for your org. You can toggle this in the dashboard at Speechbase → Model Providers.
  • byok — whether you've stored a key for this provider yet. In BYOK mode, synthesis calls targeting a provider without a key fail with no_api_key.
  • models — the prefixed model IDs you can pass in a synthesis request.

Provider access

Speechbase supports BYOK for self-serve provider access and Managed Routing for workspaces where Speechbase manages provider relationships, billing, and quotas.

With BYOK, your provider key talks to the provider directly; the provider bill arrives at the provider, not at Speechbase.

Mechanically: when you store a key via PUT /v1/api-keys/{providerId}, Speechbase stores it encrypted in a secure key store and writes a metadata row recording the last four characters and key_updated_at. We can't view or recover the full key. The plaintext key never lives on disk and is never logged. At request time the gateway decrypts the key in-memory, instantiates the provider client, and discards the plaintext when the request completes.

To rotate, just PUT the new value over the old one. To stop using a provider entirely, DELETE /v1/api-keys/{providerId} — both the encrypted key and the metadata row are cleared atomically.

For the step-by-step setup flow, see BYOK guide. For the comparison between BYOK and Managed Routing, see Providers and routing.

Fallbacks and timestamps

Not every provider exposes word-level alignment natively. Speechbase wraps each provider with a fallback STT pass (currently OpenAI Whisper) for the with-timestamps endpoints, so any provider can produce timestamps even when the upstream API doesn't. This is transparent — you don't choose; the gateway picks the best available source per request and only falls back when needed.

If both native alignment and the STT fallback fail, the /v1/audio/speech/with-timestamps endpoint returns 503 timestamps_unavailable. The conversation variant is more forgiving — it can return audio with an empty timestamps array and a warnings entry instead, so you still get the audio to play. See Word-level timestamps.

Multi-provider conversations

A single conversation request can dispatch different turns to different providers. Speechbase validates that every provider referenced in the turn list has a stored key and is enabled before any call is made — partial dispatches don't happen. See Conversations.

On this page