What each classifier actually does (and the marketing copy you should ignore)
**OpenAI omni-moderation-latest** is, as of June 2026, the most capable free moderation endpoint on the market. Per https://platform.openai.com/docs/guides/moderation, it returns 13 category scores plus an overall flag in a single JSON call, accepts text or image inputs, and works in 40-plus languages with materially better non-English accuracy than its predecessor text-moderation-007. The marketing tagline you should ignore is 'replace your entire trust-and-safety stack.' It is a classifier, not a policy engine — it tells you whether a piece of content trips a category, not what to do about it. You still need rate-limiting, appeals, human review, and audit logging on top.
**OpenAI text-moderation-007** is the legacy text-only endpoint, still supported and still free per https://platform.openai.com/docs/models/moderation. It uses the same 11-category text taxonomy that originally shipped in 2022 (per the Markov et al. paper at https://arxiv.org/abs/2208.03274). The honest assessment in 2026 is that it has been functionally superseded by omni-moderation-latest for all new builds — non-English accuracy is meaningfully worse and it cannot handle images. Use it only if you are maintaining a pinned production deployment that needs deterministic behavior.
**Google Jigsaw Perspective API** at https://perspectiveapi.com/ is the longest-running production moderation service on this list — it has been live since 2017 and has more documented field experience than any other classifier here. It returns scores across six production attributes (TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT) plus experimental attributes for finer-grained moderation. The marketing copy to ignore: Perspective is not 'free for all use.' It is free for non-commercial use under a 1 QPS default quota. Commercial use, or higher QPS, requires a formal request through the form at https://developers.perspectiveapi.com/s/docs-get-started.
**AWS Comprehend Toxicity Detection** lives inside the broader Comprehend NLP service (https://aws.amazon.com/comprehend/) and bills per character processed per https://aws.amazon.com/comprehend/pricing/. It returns scores across seven categories: HATE_SPEECH, GRAPHIC, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT, INSULT, PROFANITY. The honest gap: as of June 2026, the Comprehend Toxicity API is English-only. Multilingual workloads need to layer Comprehend's separate language detection in front and route non-English traffic somewhere else, which adds latency and complexity.
**Azure AI Content Safety** at https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety is Microsoft's purpose-built moderation product, separated cleanly from the broader Azure AI portfolio. It returns four core harm categories — Hate, Sexual, Violence, Self-Harm — each on a 0-7 severity scale rather than a probability. On top of that, Azure ships Prompt Shields (a dedicated jailbreak/indirect-injection detector), Protected Material detection for copyrighted text and code, and a Groundedness detector for RAG outputs. The four-category core looks small versus OpenAI's 13, but the severity scale is operationally easier to threshold against and the ancillary detectors are unique to Azure.
**Hugging Face RoBERTa hate-speech models** — the most cited being facebook/roberta-hate-speech-dynabench-r4-target at https://huggingface.co/facebook/roberta-hate-speech-dynabench-r4-target — are the open-weight reference for what self-hosted moderation looks like. The model itself is small (110M parameters), runs on a single T4 GPU at sub-50ms latency, and ships under a permissive license. The thing the marketing posts about 'free' moderation skip: a single checkpoint covers one task (binary hate detection in English) and you need to assemble an ensemble of fine-tuned models to match the category breadth of OpenAI or Azure. Self-hosting trades vendor cost for engineering cost, not for zero cost.