Speechbase: Universal Text-to-Speech Gateway & Voice Management

ElevenLabs text-to-speech through the Speechbase gateway — eleven_v3, multilingual v2, and the flash models.


Prefix	`elevenlabs`
Default model	`eleven_multilingual_v2`
Provider key	Connect under Provider Keys

Route to ElevenLabs by prefixing the model with elevenlabs/. All models return native word-level timestamps.

Models

Model	Streaming	Audio tags	Timestamps	Max input
`eleven_v3`	Yes	Yes	Native	5000
`eleven_multilingual_v2`	Yes	—	Native	10000
`eleven_flash_v2_5`	Yes	—	Native	40000
`eleven_flash_v2`	Yes	—	Native	30000

The flash models trade some quality for low latency and large input windows. eleven_v3 is the most expressive and the only one that interprets inline audio tags.

Usage

curl -X POST https://api.speechbase.ai/v1/audio/speech \
  -H "Authorization: Bearer $SPEECHBASE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "inline",
    "model": "elevenlabs/eleven_multilingual_v2",
    "voice": "JBFqnCBsd6RMkjVDRZzb",
    "text": "Hello from Speechbase!",
    "output": "mp3"
  }' --output hello.mp3

voice is an ElevenLabs voice ID. Find IDs in your ElevenLabs voice library, or register them as saved Voices.

Audio tags

With eleven_v3, inline tags like [whispers] or [laughs] are interpreted as delivery cues rather than spoken aloud:

{
  "mode": "inline",
  "model": "elevenlabs/eleven_v3",
  "voice": "JBFqnCBsd6RMkjVDRZzb",
  "text": "[whispers] I have a secret. [laughs] Just kidding!"
}

Provider options

Anything in providerOptions is forwarded to the ElevenLabs API unchanged (for example voice_settings with stability and similarity_boost).

ElevenLabs

Models

Usage

Audio tags

Provider options

On this page