Speechbase

Multi-speaker conversations

Generate stitched, multi-turn dialogue audio in one API call — including across providers.

If you've read Conversations, you know the shape. This guide is the practical recipe: when to use the conversation endpoint, how to choose between shared and per-turn models, and how to manage gaps and volume.

When to use it

Use POST /v1/audio/conversation whenever you need:

  • More than one voice in a single piece of audio,
  • Server-side stitching (no client-side audio mixing),
  • Volume normalisation across mixed providers,
  • Per-turn timing control.

If you just need a single voice reading a long passage, use /v1/audio/speech — it streams and gets going faster.

Shared model recipe (single provider)

The simplest case. Pin the provider/model at the top level (or per-turn in the SDK) and let every turn just pick a voice:

import { generateConversation } from "@speech-sdk/core";

const result = await generateConversation({
  apiKey: process.env.SPEECHBASE_API_KEY,
  turns: [
    { model: "openai/gpt-4o-mini-tts", voice: "alloy",   text: "Welcome to the show." },
    { model: "openai/gpt-4o-mini-tts", voice: "shimmer", text: "Glad to be here." },
  ],
  gapMs: 400,
  volumeDbfs: -16,
  output: { format: "mp3" },
});
{
  "model": "openai/gpt-4o-mini-tts",
  "turns": [
    { "voice": "alloy",   "text": "Welcome to the show." },
    { "voice": "shimmer", "text": "Glad to be here." }
  ],
  "gapMs": 400,
  "volumeDbfs": -16,
  "output": "mp3"
}

Use this when one provider has the voices you want and you don't need cross-provider mixing.

Per-turn recipe (mixed providers)

Specify the model per turn and mix providers freely:

import { generateConversation } from "@speech-sdk/core";

const result = await generateConversation({
  apiKey: process.env.SPEECHBASE_API_KEY,
  turns: [
    {
      model: "elevenlabs/eleven_v3",
      voice: "EXAV...",
      text: "I'm running narration on ElevenLabs.",
    },
    {
      model: "openai/gpt-4o-mini-tts",
      voice: "alloy",
      text: "And I'm replying on OpenAI.",
    },
  ],
  gapMs: 500,
  volumeDbfs: -16,
  output: { format: "mp3" },
});
{
  "turns": [
    {
      "provider": "elevenlabs",
      "model": "eleven_v3",
      "voice": "EXAV...",
      "text": "I'm running narration on ElevenLabs."
    },
    {
      "provider": "openai",
      "model": "gpt-4o-mini-tts",
      "voice": "alloy",
      "text": "And I'm replying on OpenAI."
    }
  ],
  "gapMs": 500,
  "volumeDbfs": -16,
  "output": "mp3"
}

Use this when:

  • Different voices live with different providers (custom-cloned voices, niche providers, etc.),
  • You want to A/B different providers in the same piece,
  • You're cost-optimising — fast/cheap provider for short turns, premium provider for hero turns.

Speechbase validates BYOK availability for every provider you reference before dispatching anything; you won't get a half-rendered conversation.

Tuning gap and volume

What you wantSettings
Tight, podcast-like back-and-forthgapMs: 250–400, volumeDbfs: -16
Thoughtful, narrated dialoguegapMs: 600–900, volumeDbfs: -18
Compliant streaming loudness (LUFS-style)volumeDbfs: -23
Audiobook-style pacinggapMs: 800+

If you skip volumeDbfs, Speechbase passes audio through without normalisation. That works when every turn comes from the same provider with the same voice settings; with mixed providers you almost certainly want it set.

Word-level timestamps

POST /v1/audio/conversation/with-timestamps returns the same envelope plus a timestamps array. Each entry includes a turnIndex so you can attribute words back to their turn. See Word-level timestamps.

Limits and behaviour

  • Buffered, not streamed. Conversations always return after every turn is rendered.
  • All-or-nothing. A failure on any turn (moderation, provider error) fails the whole request.
  • Moderation runs per-turn. Each turn's text is checked; one bad turn blocks the conversation.
  • Logged as one request. The log entry attributes the parent request and records per-turn metadata as children for analytics.

On this page