Speechbase
Providers

ElevenLabs

ElevenLabs text-to-speech through the Speechbase gateway — eleven_v3, multilingual v2, and the flash models.

Prefixelevenlabs
Default modeleleven_multilingual_v2
Provider keyConnect under Provider Keys

Route to ElevenLabs by prefixing the model with elevenlabs/. All models return native word-level timestamps.

Models

ModelStreamingAudio tagsTimestampsMax input
eleven_v3YesYesNative5000
eleven_multilingual_v2YesNative10000
eleven_flash_v2_5YesNative40000
eleven_flash_v2YesNative30000

The flash models trade some quality for low latency and large input windows. eleven_v3 is the most expressive and the only one that interprets inline audio tags.

Usage

curl -X POST https://api.speechbase.ai/v1/audio/speech \
  -H "Authorization: Bearer $SPEECHBASE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "inline",
    "model": "elevenlabs/eleven_multilingual_v2",
    "voice": "JBFqnCBsd6RMkjVDRZzb",
    "text": "Hello from Speechbase!",
    "output": "mp3"
  }' --output hello.mp3

voice is an ElevenLabs voice ID. Find IDs in your ElevenLabs voice library, or register them as saved Voices.

Audio tags

With eleven_v3, inline tags like [whispers] or [laughs] are interpreted as delivery cues rather than spoken aloud:

{
  "mode": "inline",
  "model": "elevenlabs/eleven_v3",
  "voice": "JBFqnCBsd6RMkjVDRZzb",
  "text": "[whispers] I have a secret. [laughs] Just kidding!"
}

Provider options

Anything in providerOptions is forwarded to the ElevenLabs API unchanged (for example voice_settings with stability and similarity_boost).

On this page