Building the Universal Speech SDK
- engineering
- sdk
- speech
Speech is eating software. Every product team we talk to wants to add voice — whether that's text-to-speech narration, real-time transcription, or voice-driven agents. The problem is that the speech ecosystem is fragmented across dozens of providers, each with their own SDK, auth scheme, streaming protocol, and quirks.
We built the Universal Speech SDK to fix that.
One API for every voice model
The core idea is simple: give developers one consistent TypeScript interface that works across every major speech provider. Swap providers with a single config change. No rewrites. No migrations. No vendor lock-in.
import { speech } from "@jellypod/speech-sdk"
const audio = await speech.tts({
provider: "elevenlabs",
voice: "rachel",
text: "Hello from the Universal Speech SDK",
})
Want to switch to Cartesia? Change provider: "elevenlabs" to
provider: "cartesia". That's it.
Automatic quality processing
Raw TTS output from different providers varies wildly in loudness, noise floor, and format. The SDK normalizes everything to a consistent, production-ready audio stream so you don't have to run a mastering pipeline to ship.
Infrastructure that never drops a request
The Gateway sits behind the SDK and handles retries, failover, rate limits, and observability. When one provider goes down, your product doesn't.
Open source at the core
The SDK itself is fully open source under the MIT license. The Gateway is the paid layer that gives you production infrastructure on top.