Speechbase

What is Speechbase?

The production control plane for AI speech: routing, voices, conversations, timestamps, guardrails, playgrounds, and observability.

Speechbase is the production control plane for AI speech: one API for 14 TTS providers (OpenAI, ElevenLabs, Deepgram, Cartesia, Hume, Google, Inworld, Fish Audio, Murf, Resemble, Smallest AI, fal, Mistral, xAI), plus provider access, voice management, conversations, timestamps, pronunciation dictionaries, moderation, playgrounds, and request observability.

Speechbase is more than a speech provider router. It is the place where your team connects providers, defines reusable voices, tests models, generates single-speaker and multi-speaker audio, applies pronunciation and moderation policy, and traces what happened after every request.

Use the platform when you want the flexibility of many TTS providers without building the surrounding production system yourself.

What you can do

CapabilityWhat Speechbase handlesStart here
Speech GatewayOne REST API and TypeScript SDK across OpenAI, ElevenLabs, Deepgram, Cartesia, Google, Hume, Inworld, Fish Audio, Murf, Resemble, Smallest AI, fal, Mistral, and xAI.Providers and routing
Provider accessBring your own provider keys for self-serve routing, or use Managed Routing when Speechbase is managing provider relationships, billing, and quotas for your workspace.BYOK
Voice managementSave provider/model/voice combinations once, add metadata and provider options, and reference them by a stable Speechbase voice ID.Voice management
Audio PlaygroundCompare models and voices side by side, stream audio when supported, normalize loudness, save voices, and copy integration code.Audio Playground
ConversationsRender multi-turn, multi-speaker scripts into one stitched file with gaps, volume leveling, and optional mixed providers per turn.Conversations
Word-level timestampsGet word start/end times for captions, karaoke highlighting, lip-sync, transcripts, and speaker-attributed conversation timing.Word-level timestamps
Pronunciation dictionariesFix brand names, acronyms, people, products, and domain terms before synthesis with org-wide dictionaries, per-request dictionaries, and inline rules.Pronunciations
Moderation and guardrailsRun configurable rulesets before audio generation so blocked text never reaches a provider or creates a TTS bill.Moderation
ObservabilityInspect request logs, providers, models, voices, latency, status, moderation outcomes, and conversation child events.Observability
Output handlingStream speech where providers support it, or return buffered wav, mp3, or pcm; conversations can normalize and re-encode all turns to one format.Output formats

How a request moves through Speechbase

  1. Your app calls https://api.speechbase.ai with a Speechbase API key, either directly or through @speech-sdk/core.
  2. The gateway resolves the request into a provider, model, voice, and provider-specific options.
  3. Speechbase applies pronunciation substitutions from the org default dictionary, selected dictionaries, and inline rules.
  4. The substituted text is evaluated against the selected moderation ruleset, or your org default ruleset.
  5. Speechbase routes the request through BYOK or Managed Routing provider access, depending on how your workspace is configured.
  6. The provider returns audio. Speechbase streams it back when possible, or buffers it when the endpoint needs stitching, timestamps, or format conversion.
  7. Speechbase writes an operational log entry without storing the input text, output audio, provider keys, or unsafe provider error bodies.

Provider access models

Speechbase supports two ways to fund upstream provider calls:

ModelBest forHow billing works
BYOKSelf-serve teams that already have provider accounts or want provider charges to stay on the provider invoice.You store encrypted provider keys in Speechbase. The provider bills you directly; Speechbase charges the gateway fee described on pricing.
Managed RoutingTeams that want Speechbase to manage provider relationships, billing, quotas, and provider access.Speechbase manages the provider relationship and bills usage through Speechbase. Availability depends on your plan and workspace enablement.

The API shape is the same either way: you still call Speechbase with a Speechbase API key and select models with strings like openai/gpt-4o-mini-tts. The difference is who owns the upstream provider key and bill.

Where to go next

If you want to...Read this
Make your first synthesis callQuickstart
Understand provider selection, BYOK, and Managed RoutingProviders and routing
Save voices and reuse them in productionVoice management
Compare providers before choosing oneAudio Playground
Build multi-speaker audioMulti-speaker conversations
Add captions, transcripts, or lip-sync timingWord-level timestamps
Configure content policyModeration
Look up exact endpoint shapesAPI Reference

On this page