What is Speechbase?

The production control plane for AI speech: routing, voices, conversations, timestamps, guardrails, playgrounds, and observability.

Speechbase is the production control plane for AI speech: one API for 14 TTS providers (OpenAI, ElevenLabs, Deepgram, Cartesia, Hume, Google, Inworld, Fish Audio, Murf, Resemble, Smallest AI, fal, Mistral, xAI), plus provider access, voice management, conversations, timestamps, pronunciation dictionaries, moderation, playgrounds, and request observability.

Speechbase is more than a speech provider router. It is the place where your team connects providers, defines reusable voices, tests models, generates single-speaker and multi-speaker audio, applies pronunciation and moderation policy, and traces what happened after every request.

Use the platform when you want the flexibility of many TTS providers without building the surrounding production system yourself.

What you can do

Capability	What Speechbase handles	Start here
Speech Gateway	One REST API and TypeScript SDK across OpenAI, ElevenLabs, Deepgram, Cartesia, Google, Hume, Inworld, Fish Audio, Murf, Resemble, Smallest AI, fal, Mistral, and xAI.	Providers and routing
Provider access	Bring your own provider keys for self-serve routing, or use Managed Routing when Speechbase is managing provider relationships, billing, and quotas for your workspace.	BYOK
Voice management	Save provider/model/voice combinations once, add metadata and provider options, and reference them by a stable Speechbase voice ID.	Voice management
Audio Playground	Compare models and voices side by side, stream audio when supported, normalize loudness, save voices, and copy integration code.	Audio Playground
Conversations	Render multi-turn, multi-speaker scripts into one stitched file with gaps, volume leveling, and optional mixed providers per turn.	Conversations
Word-level timestamps	Get word start/end times for captions, karaoke highlighting, lip-sync, transcripts, and speaker-attributed conversation timing.	Word-level timestamps
Pronunciation dictionaries	Fix brand names, acronyms, people, products, and domain terms before synthesis with org-wide dictionaries, per-request dictionaries, and inline rules.	Pronunciations
Moderation and guardrails	Run configurable rulesets before audio generation so blocked text never reaches a provider or creates a TTS bill.	Moderation
Observability	Inspect request logs, providers, models, voices, latency, status, moderation outcomes, and conversation child events.	Observability
Output handling	Stream speech where providers support it, or return buffered `wav`, `mp3`, or `pcm`; conversations can normalize and re-encode all turns to one format.	Output formats

How a request moves through Speechbase

Your app calls https://api.speechbase.ai with a Speechbase API key, either directly or through @speech-sdk/core.
The gateway resolves the request into a provider, model, voice, and provider-specific options.
Speechbase applies pronunciation substitutions from the org default dictionary, selected dictionaries, and inline rules.
The substituted text is evaluated against the selected moderation ruleset, or your org default ruleset.
Speechbase routes the request through BYOK or Managed Routing provider access, depending on how your workspace is configured.
The provider returns audio. Speechbase streams it back when possible, or buffers it when the endpoint needs stitching, timestamps, or format conversion.
Speechbase writes an operational log entry without storing the input text, output audio, provider keys, or unsafe provider error bodies.

Provider access models

Speechbase supports two ways to fund upstream provider calls:

Model	Best for	How billing works
BYOK	Self-serve teams that already have provider accounts or want provider charges to stay on the provider invoice.	You store encrypted provider keys in Speechbase. The provider bills you directly; Speechbase charges the gateway fee described on pricing.
Managed Routing	Teams that want Speechbase to manage provider relationships, billing, quotas, and provider access.	Speechbase manages the provider relationship and bills usage through Speechbase. Availability depends on your plan and workspace enablement.

The API shape is the same either way: you still call Speechbase with a Speechbase API key and select models with strings like openai/gpt-4o-mini-tts. The difference is who owns the upstream provider key and bill.

Where to go next

If you want to...	Read this
Make your first synthesis call	Quickstart
Understand provider selection, BYOK, and Managed Routing	Providers and routing
Save voices and reuse them in production	Voice management
Compare providers before choosing one	Audio Playground
Build multi-speaker audio	Multi-speaker conversations
Add captions, transcripts, or lip-sync timing	Word-level timestamps
Configure content policy	Moderation
Look up exact endpoint shapes	API Reference

What you can do

How a request moves through Speechbase

Provider access models

Where to go next

On this page