Moderation
How Speechbase screens synthesis requests with category-based moderation and org-defined custom rules, with fail-open and fail-closed semantics.
Speechbase runs every synthesis request through content moderation before
any provider is called. If moderation blocks the request, no upstream call is
made, no audio is produced, no provider charges are incurred, and the API
returns 422 content_moderation_blocked with a structured reason.
This is configured per-organisation at Moderation in the dashboard.
Two evaluators, run in parallel
Speechbase runs two independent evaluators on the request text. If either one returns "blocked," the request is denied.
1. Category moderation
Backed by OpenAI's omni-moderation API. Classifies text into a fixed
taxonomy:
| Category | What it catches |
|---|---|
sexual | Sexually explicit content. |
sexual/minors | Sexual content involving minors. Always-on. |
harassment | Targeted insults or bullying. |
hate | Hate speech against protected groups. |
self-harm | Encouragement of self-harm. |
violence | Threats or graphic violence. |
illicit | Instructions for illegal acts. |
| (and others — see the OpenAI docs.) |
For each category you set a threshold from 0 to 1, or null to disable.
The evaluator runs only if at least one threshold is configured. A category
fires when the model's confidence for that category exceeds your threshold.
2. Custom rules
Org-defined rules expressed in plain English. Each rule has a prompt (e.g. "Block any text that promotes a specific publicly traded stock") and an enabled flag. A configurable LLM evaluates each rule against the request text. The evaluator runs only if at least one rule is enabled.
Custom rules are evaluated by a small, fast LLM (currently
openai/gpt-5.4-nano) with strict timeouts.
Fail-open vs fail-closed
Evaluators can fail — model timeouts, upstream outages, malformed responses.
Your fail_mode setting governs what happens then:
open— evaluator errors are tolerated. The request proceeds to the provider. The failure is logged for audit but doesn't surface to the caller.closed— any evaluator error blocks the request witherror_fail_closed. Use this when correctness matters more than uptime (regulated content, child-safety surfaces, etc.).
The default is open. Switch to closed when policy correctness matters more than availability.
Block response shape
When moderation blocks a request, you get:
HTTP/1.1 422 Unprocessable Entity
Content-Type: application/json
{
"error": {
"code": "content_moderation_blocked",
"message": "Request blocked by content moderation policy.",
"reason": {
"type": "openai_category",
"detail": "violence",
"confidence": 0.91
}
}
}| Field | Notes |
|---|---|
error.code | Always content_moderation_blocked for a moderation block. |
error.reason.type | openai_category | custom_rule | error_fail_closed. |
error.reason.detail | The category name or custom-rule label. |
error.reason.confidence | Present for category blocks. Null or absent for custom-rule and fail-closed blocks. |
What you should not do with the block payload
Don't loop the moderation reason back to the user verbatim. The category or custom-rule detail is intended for application logs and dashboards, not for end-user-friendly messages. Show users a generic "we can't generate that" message and surface the precise reason internally.
And — same as everywhere else in the platform — never log the rejected text itself. See Logging hygiene.
Tuning suggestions
A reasonable starting policy for a customer-facing product:
- Always-on:
sexual/minors,self-harm,illicit. - Threshold
0.8:sexual,violence,hate. - Threshold
null(disabled):harassment— you'll re-introduce it once you've measured false-positive rate on real traffic. fail_mode: "closed"— better to occasionally show "try again" than to let a moderation outage ship a problematic clip.
Adjust from there once you have a few weeks of traffic to look at in Speechbase → Request Logs.