The Prod Stack for AI Audio

Speechbase enables AI teams to build faster with generative audio — Speech Gateway, Observability, Pronunciations, and Voice Management, all in one platform.

Speechbase dashboard

End-to-end Audio Orchestration and Provider Management

The Stack
Speech Gateway

One API to connect your app to every TTS provider.

Speechbase gives AI teams a single, universal API across 14 text-to-speech providers — plus the observability, voice tooling, and governance that applications running in production actually need.

Your Application
app.tts
Agents · IVR · PodcastsAudiobooks · AvatarsVoice UX
Speechbase
>Speech Gateway
>Observability
>Pronunciations
>Voice Management
>Moderation
14 TTS Providers
OpenAI
ElevenLabs
Google
Cartesia
Deepgram
Hume
xAI
Mistral
+6
Bring your own keys
Or managed routing
One invoice
The Platform

The pillars of a production audio stack.

A single voice library across multiple platforms.

Save voices once, reference them by name, and reuse them across providers and models. No more juggling opaque voice IDs across 14 dashboards.

  • Cross-provider voice library
  • Preview voices in the playground
  • Reference voices globally by alias or ID
  • Create voice clones (coming soon)
Read the voice management docs
Open by design

Open SDK. No lock-in.

Swap providers with a string change. Apache 2.0, runs anywhere, and pairs with the hosted Speech Gateway, Observability, Pronunciations, and Voice Management when production catches up.

$ npm install @speech-sdk/core
generate-conversation.ts
import { generateConversation } from "@speech-sdk/core";

const result = await generateConversation({
  turns: [
    {
      model: "elevenlabs/eleven_v3",
      voice: "EXAVITQu4vr4xnSDxMaL",
      text: "Hello from the SDK.",
    },
    {
      model: "google/gemini-3.1-flash-tts-preview",
      voice: "Kore",
      text: "One call. Multiple voices. Auto-leveled.",
    },
  ],
});

result.audio.uint8Array; // Uint8Array
result.audio.mediaType;  // "audio/mpeg"
Multi-speaker dialogue
Conversation

One call returns the full multi-turn script as a single volume-leveled file. Mix providers per turn, get per-turn timestamps, skip the stitching code.

Streaming by default
streamSpeech

Audio streams as it generates via a standard Web ReadableStream — pipe straight into a Response for low-latency playback in Node, Edge, or browser.

Universal audio tags
[laugh]

Write [laugh] once. The SDK passes through, translates to SSML, or strips with a warning — same syntax across every provider.

Speed without pitch shift
0.75 → 1.5×

Pitch-preserving WSOLA time-stretch on mono PCM. Timestamps and audioDurationMs auto-scale by 1/speed so alignment stays accurate.

Unicode-aware auto-chunking
Long-form

Long inputs split on balanced sentence boundaries (ASCII, CJK, Devanagari, Arabic) and stitch into one file — balanced so prosody stays continuous.

Reliable by default
Auto-retry

Jittered backoff on 5xx + 429. Retry-After honored. RFC 7807 errors with stable codes — retry logic stays a one-liner.

Ready when you are

Promote your speech stack to production.

10 million free characters a month. Every TTS provider. No credit card to start.