Providers
ElevenLabs
ElevenLabs text-to-speech through the Speechbase gateway — eleven_v3, multilingual v2, and the flash models.
| Prefix | elevenlabs |
| Default model | eleven_multilingual_v2 |
| Provider key | Connect under Provider Keys |
Route to ElevenLabs by prefixing the model with elevenlabs/. All models return
native word-level timestamps.
Models
| Model | Streaming | Audio tags | Timestamps | Max input |
|---|---|---|---|---|
eleven_v3 | Yes | Yes | Native | 5000 |
eleven_multilingual_v2 | Yes | — | Native | 10000 |
eleven_flash_v2_5 | Yes | — | Native | 40000 |
eleven_flash_v2 | Yes | — | Native | 30000 |
The flash models trade some quality for low latency and large input windows.
eleven_v3 is the most expressive and the only one that interprets inline
audio tags.
Usage
curl -X POST https://api.speechbase.ai/v1/audio/speech \
-H "Authorization: Bearer $SPEECHBASE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"mode": "inline",
"model": "elevenlabs/eleven_multilingual_v2",
"voice": "JBFqnCBsd6RMkjVDRZzb",
"text": "Hello from Speechbase!",
"output": "mp3"
}' --output hello.mp3voice is an ElevenLabs voice ID. Find IDs in your ElevenLabs voice library, or
register them as saved Voices.
Audio tags
With eleven_v3, inline tags like [whispers] or [laughs] are interpreted as
delivery cues rather than spoken aloud:
{
"mode": "inline",
"model": "elevenlabs/eleven_v3",
"voice": "JBFqnCBsd6RMkjVDRZzb",
"text": "[whispers] I have a secret. [laughs] Just kidding!"
}Provider options
Anything in providerOptions is forwarded to the ElevenLabs API unchanged
(for example voice_settings with stability and similarity_boost).

