Building the Universal Speech SDK

April 8, 2026Speechbase Team

engineering
sdk
speech

Speech is eating software. Every product team we talk to wants to add voice, whether that's text-to-speech narration, real-time transcription, or voice-driven agents. The problem is that the speech ecosystem is fragmented across dozens of providers, each with their own SDK, auth scheme, streaming protocol, and quirks.

We built the Universal Speech SDK to fix that.

One API for every voice model

The core idea is simple: give developers one consistent TypeScript interface that works across every major speech provider. Swap providers with a single config change. No rewrites. No migrations. No vendor lock-in.

import { speech } from "@jellypod/speech-sdk"

const audio = await speech.tts({
  provider: "elevenlabs",
  voice: "rachel",
  text: "Hello from the Universal Speech SDK",
})

Want to switch to Cartesia? Change provider: "elevenlabs" to provider: "cartesia". That's it.

Automatic quality processing

Raw TTS output from different providers varies wildly in loudness, noise floor, and format. The SDK normalizes everything to a consistent, production-ready audio stream so you don't have to run a mastering pipeline to ship.

Infrastructure that never drops a request

The Gateway sits behind the SDK and handles retries, failover, rate limits, and observability. When one provider goes down, your product doesn't.

Open source at the core

The SDK itself is fully open source under the Apache-2.0 license. The Gateway is the paid layer that gives you production infrastructure on top.

Check it out on GitHub or read the docs.