Split conversations into per-turn audio
- api
Pass split: true on POST /v1/audio/conversation to receive each turn as its own audio segment instead of one stitched file. The response is a JSON envelope with an audioSegments array, one entry per turn in order, each with its own base64 audio, media type, and duration. Use it when you need to play, caption, or edit speakers independently, for example dropping each turn onto its own track.
On POST /v1/audio/conversation/with-timestamps, each segment also carries block-relative word timestamps, measured from the start of that segment. Omit split (or set it to false) for the existing single-clip responses: the base endpoint returns raw audio bytes, and the with-timestamps endpoint returns the same JSON it always has.

