What is Speechbase?
The production control plane for AI speech: routing, voices, conversations, timestamps, guardrails, playgrounds, and observability.
Speechbase is the production control plane for AI speech: one API for 14 TTS providers (OpenAI, ElevenLabs, Deepgram, Cartesia, Hume, Google, Inworld, Fish Audio, Murf, Resemble, Smallest AI, fal, Mistral, xAI), plus provider access, voice management, conversations, timestamps, pronunciation dictionaries, moderation, playgrounds, and request observability.
Speechbase is more than a speech provider router. It is the place where your team connects providers, defines reusable voices, tests models, generates single-speaker and multi-speaker audio, applies pronunciation and moderation policy, and traces what happened after every request.
Use the platform when you want the flexibility of many TTS providers without building the surrounding production system yourself.
What you can do
| Capability | What Speechbase handles | Start here |
|---|---|---|
| Speech Gateway | One REST API and TypeScript SDK across OpenAI, ElevenLabs, Deepgram, Cartesia, Google, Hume, Inworld, Fish Audio, Murf, Resemble, Smallest AI, fal, Mistral, and xAI. | Providers and routing |
| Provider access | Bring your own provider keys for self-serve routing, or use Managed Routing when Speechbase is managing provider relationships, billing, and quotas for your workspace. | BYOK |
| Voice management | Save provider/model/voice combinations once, add metadata and provider options, and reference them by a stable Speechbase voice ID. | Voice management |
| Audio Playground | Compare models and voices side by side, stream audio when supported, normalize loudness, save voices, and copy integration code. | Audio Playground |
| Conversations | Render multi-turn, multi-speaker scripts into one stitched file with gaps, volume leveling, and optional mixed providers per turn. | Conversations |
| Word-level timestamps | Get word start/end times for captions, karaoke highlighting, lip-sync, transcripts, and speaker-attributed conversation timing. | Word-level timestamps |
| Pronunciation dictionaries | Fix brand names, acronyms, people, products, and domain terms before synthesis with org-wide dictionaries, per-request dictionaries, and inline rules. | Pronunciations |
| Moderation and guardrails | Run configurable rulesets before audio generation so blocked text never reaches a provider or creates a TTS bill. | Moderation |
| Observability | Inspect request logs, providers, models, voices, latency, status, moderation outcomes, and conversation child events. | Observability |
| Output handling | Stream speech where providers support it, or return buffered wav, mp3, or pcm; conversations can normalize and re-encode all turns to one format. | Output formats |
How a request moves through Speechbase
- Your app calls
https://api.speechbase.aiwith a Speechbase API key, either directly or through@speech-sdk/core. - The gateway resolves the request into a provider, model, voice, and provider-specific options.
- Speechbase applies pronunciation substitutions from the org default dictionary, selected dictionaries, and inline rules.
- The substituted text is evaluated against the selected moderation ruleset, or your org default ruleset.
- Speechbase routes the request through BYOK or Managed Routing provider access, depending on how your workspace is configured.
- The provider returns audio. Speechbase streams it back when possible, or buffers it when the endpoint needs stitching, timestamps, or format conversion.
- Speechbase writes an operational log entry without storing the input text, output audio, provider keys, or unsafe provider error bodies.
Provider access models
Speechbase supports two ways to fund upstream provider calls:
| Model | Best for | How billing works |
|---|---|---|
| BYOK | Self-serve teams that already have provider accounts or want provider charges to stay on the provider invoice. | You store encrypted provider keys in Speechbase. The provider bills you directly; Speechbase charges the gateway fee described on pricing. |
| Managed Routing | Teams that want Speechbase to manage provider relationships, billing, quotas, and provider access. | Speechbase manages the provider relationship and bills usage through Speechbase. Availability depends on your plan and workspace enablement. |
The API shape is the same either way: you still call Speechbase with a
Speechbase API key and select models with strings like
openai/gpt-4o-mini-tts. The difference is who owns the upstream provider key
and bill.
Where to go next
| If you want to... | Read this |
|---|---|
| Make your first synthesis call | Quickstart |
| Understand provider selection, BYOK, and Managed Routing | Providers and routing |
| Save voices and reuse them in production | Voice management |
| Compare providers before choosing one | Audio Playground |
| Build multi-speaker audio | Multi-speaker conversations |
| Add captions, transcripts, or lip-sync timing | Word-level timestamps |
| Configure content policy | Moderation |
| Look up exact endpoint shapes | API Reference |