Product Updates

What's new in Speechbase

Every feature, improvement, and fix we ship to our speech infrastructure, all in one place.

June 18, 2026

Split conversations into per-turn audio

Pass split: true on POST /v1/audio/conversation to receive each turn as its own audio segment instead of one stitched file. The response is a JSON envelope with an audioSegments array, one entry per turn in order, each with its own base64 audio, media type, and duration. Use it when you need to play, caption, or edit speakers independently, for example dropping each turn onto its own track.

On POST /v1/audio/conversation/with-timestamps, each segment also carries block-relative word timestamps, measured from the start of that segment. Omit split (or set it to false) for the existing single-clip responses: the base endpoint returns raw audio bytes, and the with-timestamps endpoint returns the same JSON it always has.

June 16, 2026

Consistent default loudness

Buffered synthesis now targets -16 dBFS peak by default. POST /v1/audio/speech, POST /v1/audio/speech/with-timestamps, and conversation requests normalise to that level without any extra options, so a clip from OpenAI and a clip from ElevenLabs come back at a consistent loudness instead of jumping between providers.

Pass your own volumeDbfs to target a different peak, for example -23 for more headroom. This is a peak-level target in dBFS, not an integrated-loudness (LUFS) measurement. The streaming endpoint POST /v1/audio/speech/stream is unchanged: it is a provider pass-through and does not accept volumeDbfs.

June 12, 2026

MiniMax Speech 2.8

You can now route to MiniMax Speech 2.8 through the gateway in two variants: minimax/speech-2.8-hd for the highest audio quality and minimax/speech-2.8-turbo for faster, lower-cost synthesis.

2.8 is built for long-form work. It holds voice timbre and emotional tone steady across long passages, the point where most models start to drift, which makes it a strong fit for audiobooks, narrated articles, and long documentation read-alouds. It speaks 40+ languages, clones a voice from about 10 seconds of reference audio, and renders natural pauses and sound tags like laughs and sighs inline.

Reach for HD when you're producing finished audio and Turbo when latency and cost matter, such as voice agents.

June 4, 2026

billing

Per-model pricing

Pricing is now per model and shown up front across the dashboard and the pricing page, so you can compare providers without digging through anyone's rate card.

Bringing your own provider keys? You pay a flat 3.4% platform fee on what runs through the gateway, and nothing more.

June 2, 2026

Feedback API

Collect quality feedback from your users on the audio you generate. POST /v1/feedback records a score from 0 to 100 plus an optional comment for any generation, so you can see how listeners reacted, generation by generation.

Every speech and conversation response returns an x-speechbase-request-id header. Attach that ID to a rating to pin it to the exact output, giving you a qualitative read on how your application's audio is landing.

May 29, 2026

Streaming speech synthesis

Need audio to start playing before the whole clip is ready? POST /v1/audio/speech/stream streams synthesized audio straight through from the provider for the lowest possible latency.

The standard POST /v1/audio/speech now returns the complete clip in one shot, so volume and output-format options reliably apply to the finished audio.

May 20, 2026

billing

Plans, credits, and usage analytics

Speechbase now runs on credits. Start free with 5,000 credits, upgrade to Pro ($30/mo or $300/yr) for 30,000 credits a month, and buy one-off top-ups whenever you need more.

Billing shows your monthly allocation and top-up balance separately, and the usage charts break spend down by model over time, so you can see exactly where your credits go.

May 2, 2026

moderation

Content moderation

You can now screen text against configurable moderation rulesets before it's synthesized. Set an org-wide default, or pass a moderation_ruleset_id on any speech or conversation request to apply a specific policy just for that call.

The same ruleset governs single-shot synthesis and multi-turn conversations, so your safety rules stay consistent everywhere.

May 1, 2026

logs

Request logs

Every request now shows up in the dashboard with its provider, model, status, and credits used.

For automation, GET /v1/logs returns the same history. Filter by provider, status, and time range with cursor pagination, or fetch one request with GET /v1/logs/{id}.

May 1, 2026

Pronunciation dictionaries

Create reusable pronunciation rules that rewrite tricky words before synthesis, so names, acronyms, and product terms come out the way you intend. Manage them in the dashboard or via the /v1/pronunciation-dictionaries API.

Word-level timestamps still line up with your original text, so substitutions never throw off your captions.

April 28, 2026

Multi-speaker conversations

POST /v1/audio/conversation takes a multi-turn script, each turn with its own voice, and renders it into one mixed audio file. You can mix voices from different providers in the same conversation and choose your output format (wav, mp3, or pcm).

Great for dialogue, interviews, and any back-and-forth you'd otherwise have to stitch together yourself.

April 25, 2026

Word-level timestamps

The /with-timestamps endpoints return word-level timing alongside your audio, so you can build synced captions, karaoke-style highlighting, or anything that needs to know exactly when each word is spoken.

It works across providers. Speechbase uses native timestamps where the provider offers it and fills the gap when it doesn't.

April 23, 2026

voices

Voice library

The /voices page now leads with trending voices from a curated catalog of 26 options across eight providers (ElevenLabs, OpenAI, Cartesia, Deepgram, Hume, Google, Inworld, and Murf), so it's easy to find a voice and hear it instantly.

Your saved voices and imports live together at /voices/library.

April 20, 2026

The Speechbase text-to-speech API

POST /v1/audio/speech gives you a single, OpenAI-style endpoint to generate speech across every supported provider. Swap models and voices without rewriting your integration.

Point it at a stored voice, a saved character, or an inline model and voice, and get the audio straight back. Bring your own provider API keys and Speechbase routes your requests through them, encrypted and isolated per organization.