Moderation

How Speechbase screens synthesis requests with category-based moderation and org-defined custom rules, with fail-open and fail-closed semantics.

Speechbase runs every synthesis request through content moderation before any provider is called. If moderation blocks the request, no upstream call is made, no audio is produced, no provider charges are incurred, and the API returns 422 content_moderation_blocked with a structured reason.

This is configured per-organisation at Moderation in the dashboard.

Two evaluators, run in parallel

Speechbase runs two independent evaluators on the request text. If either one returns "blocked," the request is denied.

1. Category moderation

Backed by OpenAI's omni-moderation API. Classifies text into a fixed taxonomy:

Category	What it catches
`sexual`	Sexually explicit content.
`sexual/minors`	Sexual content involving minors. Always-on.
`harassment`	Targeted insults or bullying.
`hate`	Hate speech against protected groups.
`self-harm`	Encouragement of self-harm.
`violence`	Threats or graphic violence.
`illicit`	Instructions for illegal acts.
(and others — see the OpenAI docs.)

For each category you set a threshold from 0 to 1, or null to disable. The evaluator runs only if at least one threshold is configured. A category fires when the model's confidence for that category exceeds your threshold.

2. Custom rules

Org-defined rules expressed in plain English. Each rule has a prompt (e.g. "Block any text that promotes a specific publicly traded stock") and an enabled flag. A configurable LLM evaluates each rule against the request text. The evaluator runs only if at least one rule is enabled.

Custom rules are evaluated by a small, fast LLM (currently openai/gpt-5.4-nano) with strict timeouts.

Fail-open vs fail-closed

Evaluators can fail — model timeouts, upstream outages, malformed responses. Your fail_mode setting governs what happens then:

open — evaluator errors are tolerated. The request proceeds to the provider. The failure is logged for audit but doesn't surface to the caller.
closed — any evaluator error blocks the request with error_fail_closed. Use this when correctness matters more than uptime (regulated content, child-safety surfaces, etc.).

The default is open. Switch to closed when policy correctness matters more than availability.

Block response shape

When moderation blocks a request, you get:

HTTP/1.1 422 Unprocessable Entity
Content-Type: application/json

{
  "error": {
    "code": "content_moderation_blocked",
    "message": "Request blocked by content moderation policy.",
    "reason": {
      "type": "openai_category",
      "detail": "violence",
      "confidence": 0.91
    }
  }
}

Field	Notes
`error.code`	Always `content_moderation_blocked` for a moderation block.
`error.reason.type`	`openai_category` \| `custom_rule` \| `error_fail_closed`.
`error.reason.detail`	The category name or custom-rule label.
`error.reason.confidence`	Present for category blocks. Null or absent for custom-rule and fail-closed blocks.

What you should not do with the block payload

Don't loop the moderation reason back to the user verbatim. The category or custom-rule detail is intended for application logs and dashboards, not for end-user-friendly messages. Show users a generic "we can't generate that" message and surface the precise reason internally.

And — same as everywhere else in the platform — never log the rejected text itself. See Logging hygiene.

Tuning suggestions

A reasonable starting policy for a customer-facing product:

Always-on: sexual/minors, self-harm, illicit.
Threshold 0.8: sexual, violence, hate.
Threshold null (disabled): harassment — you'll re-introduce it once you've measured false-positive rate on real traffic.
fail_mode: "closed" — better to occasionally show "try again" than to let a moderation outage ship a problematic clip.

Adjust from there once you have a few weeks of traffic to look at in Speechbase → Request Logs.