Speechbase
← BackBlog

Using MiniMax Speech 2.8 for Multi-Speaker Conversations

Pierson Marks
  • engineering
  • sdk
  • speech

MiniMax Speech 2.8 is now available in Speechbase and ships as an official provider in our open-source SpeechSDK. It's a strong choice for long-form work, holding voice timbre and emotional tone steady across long passages, speaking 40+ languages, and cloning a voice from about 10 seconds of reference audio. If you're already using the SpeechSDK for text-to-speech generation, switching to MiniMax 2.8 is a one-line change.

Using MiniMax 2.8

The SpeechSDK uses provider/model string identifiers for audio generation. Switching your speech model to MiniMax means just changing that and then choosing a MiniMax voice.

import { generateSpeech } from "@speech-sdk/core"

const { audio } = await generateSpeech({
  model: "minimax/speech-2.8-hd",
  voice: "Wise_Woman",
  text: "Chapter one. The house at the end of the lane had been empty for years.",
})

Independent of the provider you were using previously, the API and return type are always identical. Generate in either mp3, wav, or PCM, adjust the generated audio speed (0.75x to 1.5x), or use our built-in volume leveling to make the audio louder or quieter without FFmpeg.

Mix MiniMax with other voices in one conversation

If you're looking to generate multi-speaker conversations (like for a podcast or audiobooks), you're not limited to a single provider or voice for the whole project. The SDK's generateConversation() API accepts a list of turns, each with its own model and voice. That means you can pair a MiniMax voice with any other provider in the same dialogue generation. This is perfect for conversational audio, offloading the stitching of audio blobs to the SDK and creating a single, multi-speaker file without FFmpeg or other audio-processing libraries unavailable on serverless environments.

import { generateConversation } from "@speech-sdk/core"

const { audio } = await generateConversation({
  turns: [
    {
      model: "elevenlabs/eleven_v3",
      voice: "EXAVITQu4vr4xnSDxMaL",
      text: "Hello from ElevenLabs.",
    },
    {
      model: "minimax/speech-2.8-hd",
      voice: "Wise_Woman",
      text: "And hello from MiniMax!",
    },
  ],
})

Route through Speechbase

The examples above use Speechbase's managed router with BYOK (bring-your-own-key) to avoid juggling multiple secret API keys in your application's environment, using a single Speechbase key instead. One key gives you access to every provider, including MiniMax, and the gateway handles retries and failover so a single bad upstream response never surfaces to your users.

Speechbase also solves extremely long-form inputs (10,000+ characters), seamlessly chunking and stitching parallel generations into one file, so you can hand it a full chapter instead of paginating by hand.

SpeechSDK is open source under the Apache-2.0 license. Browse the docs or grab a key at speechbase.ai to try MiniMax on your own text.