Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

OpenAI Moderation API vs Google Perspective vs AWS Comprehend Toxicity vs Azure Content Safety vs Hugging Face RoBERTa — Real Trade-offs (2026)

Five trust-and-safety classifiers, five different bets on what moderation should cover. OpenAI ships omni-moderation-latest free for any developer with an API key. Google Jigsaw's Perspective API is free for non-commercial workloads with explicit quota. AWS Comprehend bills toxicity detection per character. Azure Content Safety prices in transactions and adds image plus jailbreak shields. Hugging Face hosts open-weight RoBERTa hate-speech models you can run yourself. Sources cited inline, June 2026.

By DDH Research Team at Digital Dashboard HubUpdated

Picking a moderation classifier in 2026 used to be a one-vendor decision — you bolted on Perspective API, you shipped, you moved on. That stopped being true in late 2024 when OpenAI released omni-moderation-latest as a free endpoint that also accepts images, and it really stopped being true once Azure Content Safety added a dedicated jailbreak shield and AWS folded toxicity into Comprehend. Today every trust-and-safety engineer is choosing between five live options with different category taxonomies, different pricing models, and different multilingual coverage. Pick wrong and you ship a pipeline that misses self-harm content in Portuguese, or one that costs $40,000 a month on a workload OpenAI would moderate for free. Before you commit to any single classifier, run your projected volume through the AI content moderation cost by provider breakdown so your unit economics survive a viral post.

**OpenAI Moderation** is the free baseline — both text-moderation-007 and the newer multimodal omni-moderation-latest are no-charge endpoints documented at https://platform.openai.com/docs/guides/moderation. **Google Jigsaw Perspective API** is the longest-running production moderation service, free for non-commercial use under quota per https://perspectiveapi.com/. **AWS Comprehend Toxicity Detection** lives inside the broader Comprehend NLP service at https://aws.amazon.com/comprehend/ and charges per character processed. **Azure AI Content Safety** is Microsoft's purpose-built moderation product at https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety, with text, image, jailbreak, and protected-material detection priced per 1,000 transactions. **Hugging Face RoBERTa Hate** is the open-weight reference model you can deploy on your own GPU. All pricing and capability claims in this guide come from vendor documentation as of June 2026.

The rest of this page is a working engineer's decision guide. You will get a six-column feature matrix, a deep-dive on what each classifier actually catches, a section on what the marketing copy gets wrong, a real procurement and integration plan, and answers to the questions your CTO will ask before sign-off. We also compare these classifiers against full-stack LLM safety platforms in LLM toxicity detection tools 2026 and against the model-native safety story in OpenAI safety features 2026.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

OpenAI omni-moderation vs text-moderation-007 vs Perspective vs AWS Comprehend vs Azure Content Safety vs HF RoBERTa — feature + pricing overview, June 2026

Feature
OpenAI omni-moderation-latest
OpenAI text-moderation-007
Perspective API
AWS Comprehend Toxicity
Azure Content Safety
Hugging Face Roberta Hate
PricingFree with any OpenAI API key (no per-request charge per https://platform.openai.com/docs/guides/moderation)Free with any OpenAI API keyFree for non-commercial use under quota per https://perspectiveapi.com/~$0.0001 per 100 characters for toxicity detection per https://aws.amazon.com/comprehend/pricing/~$0.75 per 1,000 text transactions, ~$1.00 per 1,000 image transactions per https://azure.microsoft.com/en-us/pricing/details/cognitive-services/content-safety/Free model weights; you pay GPU inference (typically ~$0.20-$0.80/hr on T4/A10G)
Categories (count + named list)13 categories: harassment, harassment/threatening, hate, hate/threatening, illicit, illicit/violent, self-harm, self-harm/intent, self-harm/instructions, sexual, sexual/minors, violence, violence/graphic (per https://platform.openai.com/docs/guides/moderation)11 text categories (same taxonomy minus illicit subcategories)6 production attributes: TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT (plus experimental attributes per https://developers.perspectiveapi.com/s/about-the-api-attributes-and-languages)7 categories: HATE_SPEECH, GRAPHIC, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT, INSULT, PROFANITY (per Comprehend Toxicity docs)4 harm categories (Hate, Sexual, Violence, Self-Harm) on a 0-7 severity scale, plus separate Prompt Shields, Protected Material, and Groundedness detectors per https://learn.microsoft.com/en-us/azure/ai-services/content-safety/Binary or multi-label hate classification depending on checkpoint (e.g. facebook/roberta-hate-speech-dynabench-r4 ships 2 labels: hate / not-hate)
Multilingual (count)40+ languages with accuracy improvements over text-moderation-007 in non-English per https://openai.com/index/upgrading-the-moderation-api/English-strongest; degraded quality in non-English per OpenAI's own omni-moderation announcement18 production languages including EN, ES, FR, DE, IT, PT, RU, AR, ZH, JA, KO, HI per https://developers.perspectiveapi.com/s/about-the-api-attributes-and-languagesEnglish only as of June 2026 — verify at aws.amazon.com/comprehend/faqs/100+ languages for text; image is language-agnostic per https://learn.microsoft.com/en-us/azure/ai-services/content-safety/language-supportDepends on checkpoint; the dynabench-r4 model is English-only; XLM-R-based hate models on HF Hub cover 100+ languages with weaker per-language quality
Multimodal (text + image)Yes — text and image classification in a single endpoint per https://openai.com/index/upgrading-the-moderation-api/Text onlyText onlyText only (image moderation lives in AWS Rekognition, billed separately)Yes — separate text and image endpoints, plus video (preview) per https://learn.microsoft.com/en-us/azure/ai-services/content-safety/Most popular checkpoints are text-only; multimodal hate detectors exist (LLaVA-derived) but are research-grade
Latency (typical p50)~150-400 ms text, ~400-900 ms image per OpenAI status reports~80-200 ms text~100-300 ms per attribute, often parallelized client-side~150-400 ms per Comprehend region~100-300 ms text, ~300-800 ms image per region5-50 ms on a warm GPU; cold start 5-30 s on a T4
Quota / QPSTier-based; Tier 1 typically 1,000 RPM, Tier 5 up to 10,000+ RPM per https://platform.openai.com/docs/guides/rate-limitsSame tier-based limits as omni-moderation1 QPS default, request 10 QPS+ via form per https://developers.perspectiveapi.com/s/docs-get-startedAWS service quotas — default 100 TPS, increase via Service Quotas console per https://docs.aws.amazon.com/general/latest/gr/comprehend.htmlDefault 1,000 transactions per 10 seconds; raise via Azure support ticketBounded only by your GPU fleet
Max text length per request32,768 tokens per https://platform.openai.com/docs/models/moderation32,768 tokens~20,480 characters per request per Perspective API reference~100,000 UTF-8 bytes per https://docs.aws.amazon.com/comprehend/latest/dg/guidelines-and-limits.html10,000 characters per request per https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/text-moderation-quotas-limitsModel context (typically 512 tokens for RoBERTa-base); chunk longer inputs
Accuracy claim (cited)OpenAI reports 'mean average precision improvements' over text-moderation-007 across 40 languages but does not publish a single headline F1 per https://openai.com/index/upgrading-the-moderation-api/Per the original 'A Holistic Approach to Undesired Content Detection' paper (Markov et al., 2023, https://arxiv.org/abs/2208.03274), reported AUPRC varies by category — no single headline numberPer Perspective's published model cards at https://developers.perspectiveapi.com/s/about-the-api-model-cards, attribute AUC scores range roughly 0.91-0.97 on internal test setsAWS does not publish per-category accuracy benchmarks for Comprehend Toxicity — verify at https://aws.amazon.com/comprehend/Microsoft publishes severity-level accuracy in the Azure AI Content Safety transparency note at https://learn.microsoft.com/en-us/legal/cognitive-services/content-safety/transparency-notePer the Dynabench paper (Vidgen et al., 2021, https://arxiv.org/abs/2012.15761), the R4 model achieves ~85% accuracy on the Dynabench adversarial set — methodology, not a head-to-head
OSS / SaaSSaaS only (closed-weight)SaaS only (closed-weight)SaaS only (closed-weight; some open research at https://github.com/conversationai)SaaS only (AWS-managed model)SaaS only (Microsoft-managed)Open-weight; deploy anywhere
Data residency / privacyUS-region by default; Zero Data Retention available on enterprise tier per https://openai.com/enterprise-privacy/Same as omni-moderationGoogle Cloud regions; data not used to train models per https://perspectiveapi.com/privacyChoice of AWS regions including US, EU, APAC per https://docs.aws.amazon.com/comprehend/30+ Azure regions; customer data not used to train models per https://learn.microsoft.com/en-us/legal/cognitive-services/content-safety/data-privacy100% self-controlled; your data never leaves your VPC
Streaming / batch supportSynchronous JSON; no streaming; batch via your own concurrencySameSynchronous; recommended to parallelize attribute scoringSynchronous DetectToxicContent + asynchronous batch via Comprehend jobsSynchronous; batch endpoint for image moderationWhatever your inference server supports (vLLM, TGI, Triton)
Best fitTeams already on OpenAI who want free multimodal moderation with the broadest category taxonomyLegacy OpenAI users; new builds should use omni-moderationComments/forum moderation where the 6 Perspective attributes map cleanly to your policyAWS-native pipelines that already use Comprehend for entities/sentimentMicrosoft 365/Azure shops needing jailbreak shields and EU data residencyPrivacy-sensitive workloads (healthcare, gov) that cannot send text to third parties

Sources as of June 2026 — verify before procurement: https://platform.openai.com/docs/guides/moderation, https://platform.openai.com/docs/models/moderation, https://perspectiveapi.com/, https://developers.perspectiveapi.com/s/, https://aws.amazon.com/comprehend/, https://aws.amazon.com/comprehend/pricing/, https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety, https://azure.microsoft.com/en-us/pricing/details/cognitive-services/content-safety/, https://huggingface.co/facebook/roberta-hate-speech-dynabench-r4-target. SaaS pricing, category taxonomies, and quotas change frequently — confirm in writing before any production rollout.

What each classifier actually does (and the marketing copy you should ignore)

**OpenAI omni-moderation-latest** is, as of June 2026, the most capable free moderation endpoint on the market. Per https://platform.openai.com/docs/guides/moderation, it returns 13 category scores plus an overall flag in a single JSON call, accepts text or image inputs, and works in 40-plus languages with materially better non-English accuracy than its predecessor text-moderation-007. The marketing tagline you should ignore is 'replace your entire trust-and-safety stack.' It is a classifier, not a policy engine — it tells you whether a piece of content trips a category, not what to do about it. You still need rate-limiting, appeals, human review, and audit logging on top.

**OpenAI text-moderation-007** is the legacy text-only endpoint, still supported and still free per https://platform.openai.com/docs/models/moderation. It uses the same 11-category text taxonomy that originally shipped in 2022 (per the Markov et al. paper at https://arxiv.org/abs/2208.03274). The honest assessment in 2026 is that it has been functionally superseded by omni-moderation-latest for all new builds — non-English accuracy is meaningfully worse and it cannot handle images. Use it only if you are maintaining a pinned production deployment that needs deterministic behavior.

**Google Jigsaw Perspective API** at https://perspectiveapi.com/ is the longest-running production moderation service on this list — it has been live since 2017 and has more documented field experience than any other classifier here. It returns scores across six production attributes (TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT) plus experimental attributes for finer-grained moderation. The marketing copy to ignore: Perspective is not 'free for all use.' It is free for non-commercial use under a 1 QPS default quota. Commercial use, or higher QPS, requires a formal request through the form at https://developers.perspectiveapi.com/s/docs-get-started.

**AWS Comprehend Toxicity Detection** lives inside the broader Comprehend NLP service (https://aws.amazon.com/comprehend/) and bills per character processed per https://aws.amazon.com/comprehend/pricing/. It returns scores across seven categories: HATE_SPEECH, GRAPHIC, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT, INSULT, PROFANITY. The honest gap: as of June 2026, the Comprehend Toxicity API is English-only. Multilingual workloads need to layer Comprehend's separate language detection in front and route non-English traffic somewhere else, which adds latency and complexity.

**Azure AI Content Safety** at https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety is Microsoft's purpose-built moderation product, separated cleanly from the broader Azure AI portfolio. It returns four core harm categories — Hate, Sexual, Violence, Self-Harm — each on a 0-7 severity scale rather than a probability. On top of that, Azure ships Prompt Shields (a dedicated jailbreak/indirect-injection detector), Protected Material detection for copyrighted text and code, and a Groundedness detector for RAG outputs. The four-category core looks small versus OpenAI's 13, but the severity scale is operationally easier to threshold against and the ancillary detectors are unique to Azure.

**Hugging Face RoBERTa hate-speech models** — the most cited being facebook/roberta-hate-speech-dynabench-r4-target at https://huggingface.co/facebook/roberta-hate-speech-dynabench-r4-target — are the open-weight reference for what self-hosted moderation looks like. The model itself is small (110M parameters), runs on a single T4 GPU at sub-50ms latency, and ships under a permissive license. The thing the marketing posts about 'free' moderation skip: a single checkpoint covers one task (binary hate detection in English) and you need to assemble an ensemble of fine-tuned models to match the category breadth of OpenAI or Azure. Self-hosting trades vendor cost for engineering cost, not for zero cost.


Architecture: where the classifier plugs into a real moderation pipeline

A production trust-and-safety pipeline in 2026 has three layers: ingress filtering, generation filtering, and post-hoc review. Ingress filtering moderates user-generated input before it hits your LLM or your forum — this is where OpenAI moderation, Perspective, and the Hugging Face models excel because they are cheap per-request and synchronous. Generation filtering moderates LLM output before it reaches the user — this is where OpenAI omni-moderation and Azure Content Safety have the edge because their taxonomies were designed with LLM outputs in mind. Post-hoc review batches recently published content for audit — this is where AWS Comprehend's batch jobs and self-hosted RoBERTa models on a GPU fleet shine because cost-per-character dominates.

**OpenAI moderation** plugs in with a single HTTPS POST per request to https://api.openai.com/v1/moderations. The integration is the simplest on this list — one endpoint, one JSON schema, no client SDK required. If you are already calling OpenAI for generation, you reuse the same API key and the same retry logic. The constraint: synchronous-only, no streaming, and tier-based rate limits documented at https://platform.openai.com/docs/guides/rate-limits. For a viral-content burst, you need client-side queueing or you will hit 429s.

**Perspective API** is also a single HTTPS POST, but per attribute scoring is essentially parallel — you request which attributes you want and Perspective returns scores. The reference docs at https://developers.perspectiveapi.com/s/ are clear and the response schema is stable. The constraint: the 1 QPS default quota is very low for production. Plan to request 10-100 QPS via the form on the Perspective site before you launch, and budget 2-4 weeks for Google's review of commercial use cases.

**AWS Comprehend** integrates through the standard AWS SDK — DetectToxicContent for synchronous calls, StartToxicityDetectionJob for batches against S3 inputs. The architectural advantage is that if you are already on AWS, the IAM and VPC story comes for free per https://docs.aws.amazon.com/comprehend/. The cost: the English-only constraint means non-English traffic needs a second vendor or self-hosted model, so the single-cloud assumption breaks anyway.

**Azure Content Safety** integrates through Azure AI Foundry SDKs or direct REST. It is the only service on this list that ships Prompt Shields as a first-class endpoint — per https://learn.microsoft.com/en-us/azure/ai-services/content-safety/, you call /text:shieldPrompt with the user message plus any retrieved documents and get back jailbreak and indirect-injection classifications in a single call. For RAG applications on Azure OpenAI, this collapses three detectors into one API.

**Hugging Face RoBERTa** integrates wherever you host it — vLLM, TGI, Triton, Modal, Replicate, or your own Kubernetes cluster. You gain total control over region, encryption, audit logs, and scale-down behavior. The architectural cost is real: a production-grade self-hosted classifier needs autoscaling, observability, model versioning, and an on-call rotation. Most teams that try this end up rebuilding 60 percent of what OpenAI gives them free, then ship the same answer eighteen months later.


Benchmark deep-dive: what the published numbers actually mean

There is no single head-to-head benchmark across all five classifiers in 2026, and anyone who tells you otherwise is selling something. Each vendor publishes numbers against their own internal test sets, often with category-specific thresholds that are not directly comparable. The most rigorous public reference for OpenAI's text moderation is still the original 'A Holistic Approach to Undesired Content Detection' paper by Markov et al. (https://arxiv.org/abs/2208.03274), which reports per-category AUPRC but no single headline F1. The omni-moderation announcement at https://openai.com/index/upgrading-the-moderation-api/ claims significant accuracy improvements but does not publish raw scores for independent verification.

**Perspective API** publishes model cards at https://developers.perspectiveapi.com/s/about-the-api-model-cards with attribute-level AUC scores on Jigsaw's internal evaluation sets. The numbers are credible — Perspective has been benchmarked against in dozens of academic papers — but the test sets skew toward English-language comment-forum content (Wikipedia talk pages, news comments), which is exactly Jigsaw's original training distribution. Performance on, say, Brazilian Portuguese gaming chat is empirically lower than the model card numbers suggest. Verify on your own data.

**AWS Comprehend Toxicity** does not publish per-category accuracy benchmarks as of June 2026 — verify at https://aws.amazon.com/comprehend/. This is a genuine gap: you have to evaluate the model on your own data before committing, and you have to do it through the synchronous API because there is no offline evaluation harness. For a procurement decision, the right move is to ship a 1,000-message labeled evaluation set through Comprehend and measure precision-at-recall yourself.

**Azure Content Safety** publishes a transparency note at https://learn.microsoft.com/en-us/legal/cognitive-services/content-safety/transparency-note that includes severity-level accuracy data, category definitions, and known limitations. The transparency note is the gold standard of vendor disclosure on this list — it tells you which demographic groups the system underperforms on, which content types confuse the classifier, and which languages have lower quality. Reading it before procurement is mandatory, not optional.

**Hugging Face RoBERTa Hate** — specifically the Dynabench R4 checkpoint — has the most academically rigorous benchmark on this list because the Dynabench evaluation methodology (Vidgen et al., 2021, https://arxiv.org/abs/2012.15761) was designed to be adversarial. Reported accuracy around 85 percent on the R4 adversarial set is meaningful in the sense that it is hard to game, but it is binary hate-versus-not-hate, not the 13-category taxonomy OpenAI ships. Comparing the two numbers directly is a category error.

The honest 2026 conclusion on benchmarks: do not buy on vendor accuracy claims. Build a 500-to-2,000 example labeled test set from your own production traffic, run it through every classifier under consideration, and measure precision-at-recall against the policy you actually want to enforce. The classifier that scores 0.94 AUC on Wikipedia comments may score 0.71 on your TikTok-style short-form video transcripts, and that gap dwarfs any vendor's reported headline number.


Real use-case decision matrix: which classifier to pick for which workload

If you are already calling the OpenAI API for generation and you need free, broad-taxonomy text moderation across many languages with multimodal coverage, use **OpenAI omni-moderation-latest**. The cost math is unbeatable — zero incremental dollars per request — and the 13-category taxonomy is the most comprehensive on this list. The constraint is rate limits: at Tier 1 you have 1,000 RPM, so high-volume consumer apps need to climb the tier ladder or queue locally. Pricing details at https://platform.openai.com/docs/guides/moderation.

If you run a comment, forum, or chat product where the policy maps cleanly to insults, identity attacks, threats, and profanity — and especially if you serve multiple Indo-European languages — use **Perspective API**. It is purpose-built for exactly this workload, the operational history is the longest of any classifier on this list, and the six-attribute taxonomy is well-understood by trust-and-safety teams. Request commercial quota early at https://developers.perspectiveapi.com/s/docs-get-started — Google's review is slow and you do not want to launch capped at 1 QPS.

If you live entirely inside AWS and your moderation needs are English-only synchronous text checks with strong IAM and VPC integration, use **AWS Comprehend Toxicity**. The bill at $0.0001 per 100 characters per https://aws.amazon.com/comprehend/pricing/ is predictable, the IAM and CloudTrail story is best-in-class, and you avoid adding a non-AWS vendor to your data flow. The constraint is starkly the language coverage — anything beyond English requires a second classifier.

If you build on Azure OpenAI, your RAG pipelines need jailbreak and indirect-injection protection, you have EU customers requiring regional data residency, or your product touches copyrighted content, use **Azure AI Content Safety**. Prompt Shields, Protected Material detection, and Groundedness are unique to Azure as of June 2026, and the 100-plus-language text coverage at https://learn.microsoft.com/en-us/azure/ai-services/content-safety/language-support is broader than every other SaaS option on this list.

If you operate in healthcare, government, finance, or any other regulated domain where sending user text to a third-party API is a procurement blocker, run a self-hosted **Hugging Face RoBERTa** ensemble. You take on the operational cost — autoscaling, observability, model lifecycle — but you gain total control over data flow, retention, and audit logging. This is also the right answer for languages where SaaS coverage is poor (Tagalog, Swahili, Bengali) because you can fine-tune your own checkpoint on labeled in-domain data.

The hybrid pattern that wins most consumer-scale moderation reviews in 2026: OpenAI omni-moderation as the cheap default for both ingress and generation filtering, with Perspective API layered on user-generated comments for finer-grained insult and identity-attack detection, plus Azure Prompt Shields if you are running a RAG product on Azure OpenAI. Three classifiers, three different jobs, and the total cost stays under $0.001 per moderated event for the vast majority of workloads. See LLM toxicity detection tools 2026 for how full-stack platforms compare against this DIY stack.


Pricing and operational cost: what your bill actually looks like at scale

At low volume, this is not a real comparison — OpenAI and Perspective are free, and a 10,000-event-per-day workload costs essentially zero on either. The interesting cost question kicks in around 1 million moderated events per day. At that volume, **OpenAI omni-moderation** is still $0 incremental — you only pay for the OpenAI API access you already have. **Perspective API** is still free if you qualify for non-commercial use; if you are a commercial workload, you negotiate a custom rate with Jigsaw and the numbers are not public. **AWS Comprehend** at $0.0001 per 100 characters and an average 280-character message lands at about $2,800 per day, or $84,000 per month per https://aws.amazon.com/comprehend/pricing/.

**Azure Content Safety** at $0.75 per 1,000 text transactions per https://azure.microsoft.com/en-us/pricing/details/cognitive-services/content-safety/ lands at $750 per day, or $22,500 per month, for that same 1M-events-per-day workload. The pricing curve flattens nicely above 1M transactions per day with enterprise commitments — Azure will negotiate volume discounts, especially if you are bundling with Azure OpenAI Service. Image moderation at $1.00 per 1,000 transactions adds proportional cost; budget Azure as the most expensive SaaS option in this group at extreme scale.

**Self-hosted RoBERTa** at 1M events per day on a single A10G GPU instance (about $0.60/hour on most major clouds) lands at about $432 per month — but only if you can keep the GPU busy. The real cost is operational: a senior MLE managing a self-hosted moderation fleet costs $250,000-plus per year fully loaded, and you need at least 0.25 FTE of attention to keep the stack healthy. Self-hosting wins on cost only above roughly 10M events per day, or when the regulatory case justifies the operational cost regardless of unit economics.

The hidden line items every team underestimates: appeals workflow ($30,000-$80,000 to build, ongoing CS cost), human review tooling ($50,000-$150,000 for a Lasso, Hive, or in-house build), policy authoring ($1-2 weeks of legal review per category), and regulatory documentation (DSA reporting, age-verification audits, child-safety attestations). The moderation classifier itself is often the cheapest part of a real trust-and-safety stack. Do not optimize the bottom 10 percent of the cost stack at the expense of the top 90.

If you want to model a real bill across providers before committing, the AI content moderation cost by provider calculator walks through unit economics at 10K, 100K, 1M, and 10M events per day. The most common pricing surprise is not the classifier cost — it is the fact that high-volume workloads need both ingress and generation filtering, which doubles your transaction count overnight.

One more pricing nuance: **OpenAI moderation** being free does not mean it is rate-limit-free. At Tier 1, 1,000 RPM is roughly 1.4 million requests per day if perfectly distributed — but real traffic is not perfectly distributed, and a viral event will saturate the limit instantly. Plan to climb to Tier 3+ ($100+ spend in 30 days, 7+ days payment history) before launching consumer-scale moderation on OpenAI, per https://platform.openai.com/docs/guides/rate-limits.


Build vs. buy: when self-hosting actually wins

The default 2026 answer is buy — specifically, use OpenAI omni-moderation or Perspective for free, layer Azure Content Safety on top if you need jailbreak shields or copyright detection, and write the saved engineering hours back into your trust-and-safety policy work. The economics overwhelmingly favor SaaS for any team under ~10M events per day, and the operational reliability of these endpoints in 2026 is meaningfully better than what most teams can build internally in the first year.

The case for self-hosting is narrower than it was in 2022 but it is real. Three legitimate reasons to deploy your own RoBERTa or DeBERTa fleet in 2026: (1) regulated data that cannot legally leave your VPC under HIPAA, FedRAMP High, or specific national data sovereignty laws; (2) languages or content domains where SaaS coverage is empirically poor on your test set; (3) latency-critical paths where the round-trip to a SaaS endpoint is too slow (real-time game voice chat, IVR systems with strict sub-100ms budgets).

The fourth reason often given — 'we want to fine-tune on our own policy' — is mostly a trap. Modern SaaS moderation APIs let you set per-category thresholds and combine outputs against your own policy in a hosted environment. Fine-tuning a custom RoBERTa head sounds appealing in the design doc and looks like a six-month project plus an ongoing labeling operation in practice. Unless you are confident the SaaS APIs cannot reach your accuracy target on your test set, do not start down this path.

If you do go the self-hosted route, the practical 2026 stack is: a fine-tuned XLM-RoBERTa-large for multilingual hate and harassment, a separate DeBERTa-v3 fine-tune for sexual content, an open-source jailbreak classifier (e.g., the prompt-injection detectors on https://huggingface.co/protectai), and a small ensemble layer that combines their outputs with your policy thresholds. Total inference is 4-6 GPU calls per moderated event, which on a single A10G handles roughly 300 events per second.

The cost calculator at AI content moderation cost by provider lets you compare a self-hosted ensemble against the SaaS options at your projected volume. The most common mistake is forgetting that the GPU bill is the small line — the labeling cost to build and maintain your evaluation set is the big one. Plan on $30,000-$80,000 per year for labeling, ongoing, if you want to keep your custom classifier honest as content trends change.

The bottom line on build-versus-buy: the model is not the moat. The continuously updated training data, the multilingual coverage, the policy taxonomy, and the operational uptime — that is what you are getting from OpenAI, Perspective, AWS, and Azure. If you have a regulated-data reason, a niche-language reason, or a strict latency reason, build. Otherwise, the right move in 2026 is to use the free SaaS endpoints and spend your engineering budget on what your competitors actually cannot replicate.


Implementation timeline: what the first 30 days look like

**OpenAI moderation** is the fastest integration on this list — most teams ship a production-grade ingress filter in 2 to 5 days. Days 1-2: instrument an existing endpoint with a moderation call, log the categories returned, and run a few hundred real messages through it to sanity-check the output. Days 3-5: define your category thresholds, wire up the block / shadow-block / flag-for-review actions, and add structured logging for audit. Day 5 to 30 is the policy work — deciding what to actually do when a category trips — and that work is identical regardless of which classifier you pick.

**Perspective API** takes 5 to 14 days because the quota request adds asynchronous waiting. Days 1-3: implement against the free non-commercial tier, validate the response schema, build the attribute-to-policy mapping. Day 3: submit the commercial quota form at https://developers.perspectiveapi.com/s/docs-get-started. Days 4-10: while waiting for Google's review, build your client-side queueing, error handling, and fallback logic. Days 10-14: quota approved (typically), bump production QPS, complete rollout.

**AWS Comprehend** takes 7 to 14 days for an AWS-native team — most of which is IAM, VPC endpoint, and CloudTrail setup. Days 1-2: configure the Comprehend IAM role, set up VPC endpoints if you need private connectivity per https://docs.aws.amazon.com/comprehend/latest/dg/auth-and-access-control.html. Days 3-7: build the synchronous DetectToxicContent integration plus batch StartToxicityDetectionJob for post-hoc review. Days 8-14: policy mapping, monitoring, and operational handoff to your platform team.

**Azure Content Safety** takes 5 to 10 days for an Azure-native team. Days 1-3: provision the Azure Content Safety resource, configure the API key in Key Vault, and wire up the text + image analyze endpoints. Days 4-7: integrate Prompt Shields if you are running a RAG application, configure custom blocklists, and tune severity thresholds per https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/harm-categories. Days 8-10: policy mapping and operational rollout.

**Hugging Face RoBERTa** self-hosted takes 4 to 12 weeks depending on how much production infrastructure you already have. Weeks 1-2: model selection, evaluation harness build, accuracy measurement on your test set. Weeks 3-6: deploy on Modal, Replicate, or your own Kubernetes, build autoscaling, instrument latency and quality dashboards. Weeks 6-10: build the appeals workflow, audit logging, and human-review escalation path. Weeks 10-12: policy mapping, threshold tuning, and operational handoff. Self-hosting is fast to demo and slow to ship.

The shared 30-day work, regardless of classifier choice: write your policy document, define your category-to-action mapping, build your appeals workflow, set up your human review queue, and instrument metrics for both false-positive and false-negative review. The classifier integration is the easy 20 percent of the work. The remaining 80 percent is the trust-and-safety operation around it, and that operation is the actual product.


The opinionated 2026 pick: what we would actually ship

If we were shipping a new consumer product on top of an LLM in 2026, we would ship **OpenAI omni-moderation-latest** as the default moderation layer on day one. The combination of free pricing, the 13-category taxonomy, multimodal support, and 40-plus-language coverage is unmatched at zero incremental cost. The only operational gotcha is rate limits — climb to Tier 3 or higher before launch per https://platform.openai.com/docs/guides/rate-limits, and add client-side queueing for traffic bursts.

If the product is a comments, forum, or community workload, we would add **Perspective API** in parallel for the specific use case where the six Perspective attributes — TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT — map more cleanly to community standards than OpenAI's broader taxonomy. Perspective's operational history at https://perspectiveapi.com/ is the longest in the industry and it tends to be more conservative on insult and identity-attack categories, which is usually what community managers want.

If the product is a RAG application running on Azure OpenAI, we would add **Azure AI Content Safety Prompt Shields** to the input layer. The jailbreak and indirect-injection detection is genuinely best-in-class as of June 2026, and the integration with the Azure OpenAI Service is one less custom-built classifier in our stack. Combined cost is well under $0.001 per moderated event for most workloads.

If the product is in a regulated industry — healthcare, government, finance — where third-party API calls are blocked, we would self-host a **Hugging Face XLM-RoBERTa** ensemble on a single A10G or L4 GPU. We would not pretend this is cheaper than SaaS unless we were above 10M events per day; we would justify the cost on regulatory grounds and budget appropriately for the operational overhead.

We would not ship **AWS Comprehend Toxicity** unless we were locked into an English-only AWS-native workload with a specific procurement reason to avoid OpenAI and Azure. The English-only constraint as of June 2026 is too limiting for most consumer products, and the per-character pricing is the highest of any SaaS option at scale per https://aws.amazon.com/comprehend/pricing/. It is a fine choice for a specific shape of workload; it is not the default we would reach for first.

The one thing we would not do in 2026 is rely on a single classifier as the entire trust-and-safety strategy. The classifier is one component in a pipeline that includes policy, rate-limiting, appeals, human review, audit logging, and regulatory reporting. Pick the classifier that fits the workload, treat it as the cheap commodity layer it is, and put the budget where the actual product value lives — the people, the policy, and the user experience around enforcement. For a complementary view, see OpenAI safety features 2026 and Gemini safety features 2026.

How to pick the right moderation classifier for your team

  1. 1

    Step 1: Write the policy before you pick the classifier

    On a single page, write the categories of content you actually want to block, flag, or escalate. Be specific: 'targeted harassment of an identifiable individual' is a policy; 'toxic content' is not. Map each policy line to a measurable outcome — what happens to the user, the post, and the audit log when this fires. Once you have that page, you can map your policy categories to OpenAI's 13, Perspective's 6, AWS's 7, Azure's 4-plus-severity, or your own custom labels. Without the policy, every classifier looks plausible and you will pick on price. With the policy, the right classifier is usually obvious in 10 minutes. The most common failure mode is buying the classifier first and writing the policy around what it returns, which guarantees you end up enforcing the vendor's idea of trust and safety instead of your own.

  2. 2

    Step 2: Build a labeled evaluation set from your real traffic

    Pull 500 to 2,000 real user messages from your production logs (anonymized, with legal review for PII). Have two reviewers label each message against your policy from Step 1, resolve disagreements as a third pass, and treat the resulting set as your ground truth. Run that set through every classifier under consideration via their synchronous APIs — most teams can run this evaluation in a single afternoon for under $20 of API spend. Measure precision, recall, and F1 at multiple thresholds, per category, per language. The vendor with the best headline number on their own data may not be the vendor with the best F1 on yours. This is the single highest-leverage hour of work in the entire procurement process, and skipping it is how teams ship moderation that misses 30 percent of the content it should catch.

  3. 3

    Step 3: Model your real cost at projected scale (and 10x scale)

    Build a one-page TCO model that includes per-event classifier cost, rate-limit headroom needed, appeals workflow build cost, human review tooling, and policy authoring cost. Run it at your current volume, at 3x, and at 10x to surface where the curve breaks. For OpenAI moderation, model the rate-limit ladder — when you hit 1.4M requests per day at Tier 1, you need to climb tiers or queue. For Perspective, model the QPS quota request lead time. For AWS Comprehend, model the per-character bill at scale and add the second classifier you will need for non-English traffic. For Azure, model the per-1,000-transaction bill and ask for enterprise discounts above 1M transactions per day. For self-hosting, model the fully-loaded MLE cost, not just the GPU bill. Compare against the AI content moderation cost by provider breakdown to sanity-check assumptions.

  4. 4

    Step 4: Verify data residency, retention, and compliance posture in writing

    Get the data processing agreement, the data retention commitment, and the regional data residency options in writing before you ship to any user. For OpenAI, verify Zero Data Retention availability if you need it per https://openai.com/enterprise-privacy/. For Perspective, verify the no-training-on-customer-data commitment at https://perspectiveapi.com/privacy. For AWS, verify your region of choice and the Comprehend FedRAMP posture if you are in regulated workloads per https://docs.aws.amazon.com/comprehend/. For Azure, verify the no-training-on-customer-data commitment in the Content Safety transparency note and the EU regional availability per https://learn.microsoft.com/en-us/legal/cognitive-services/content-safety/data-privacy. For self-hosted RoBERTa, document your own retention and access policies because there is no vendor to inherit them from. Get this in the master services agreement, not the marketing page.

  5. 5

    Step 5: Ship a shadow deployment before you ship the block button

    Run the chosen classifier in shadow mode for at least 7 to 14 days before any user-visible enforcement. Log every classification, every threshold trip, and every action you would have taken — but take no action. Review the false-positive set with your trust-and-safety team or counsel and tune thresholds against your policy. The first time you flip enforcement to live, you will discover edge cases you missed: sarcasm flagged as harassment, medical content flagged as self-harm, news quotes flagged as hate. The shadow period is how you find those edges without harming real users. Most teams skip this step because the demo worked on a handful of test messages, then take a Twitter beating in week two when the false-positive rate hits real users at scale. Ship shadow first, ship live second.

Frequently Asked Questions

Is OpenAI omni-moderation-latest really free, or are there hidden costs?

It is genuinely free per https://platform.openai.com/docs/guides/moderation — OpenAI does not charge per-request for moderation endpoints, including the multimodal omni-moderation-latest model. The hidden cost is rate limits: at API Tier 1, you get roughly 1,000 RPM, which is plenty for a development workload but saturates immediately at consumer scale. Climbing the tier ladder requires payment history and total API spend per https://platform.openai.com/docs/guides/rate-limits. For most teams, the practical answer is that omni-moderation is free in dollars and constrained in throughput — plan to be at Tier 3 or higher before any consumer launch and add client-side queueing for traffic bursts. There is no contract minimum, no SLA upgrade fee, and no data egress cost; the only real ongoing cost is the engineering attention to handle 429 responses gracefully.

How does OpenAI moderation compare to Perspective API for forum and comment moderation?

Perspective wins on operational maturity for comment-style content — it has been live since 2017, has the longest field history, and its six-attribute taxonomy (TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT per https://developers.perspectiveapi.com/s/about-the-api-attributes-and-languages) maps cleanly to most community standards. OpenAI omni-moderation wins on category breadth (13 categories, including self-harm and sexual-minors detection that Perspective doesn't surface as a discrete attribute), multimodal support, and language coverage. The 2026 pattern most large community products run is both: Perspective on comments where insult and identity-attack precision matter most, OpenAI on LLM-generated content where the broader taxonomy matters more. They are complementary, not substitutes, and combined cost is essentially zero for non-commercial use plus free OpenAI access.

Why is AWS Comprehend Toxicity English-only when AWS supports so many languages elsewhere?

As of June 2026 — verify at https://aws.amazon.com/comprehend/faqs/ — AWS Comprehend's toxicity detection API only supports English, while the broader Comprehend service supports entity detection and sentiment in many more languages. This is a deliberate AWS choice, likely reflecting both training data availability and the operational quality bar AWS holds itself to for safety-critical classifiers. The practical implication for buyers is that any non-English workload on AWS needs a second classifier — typically OpenAI omni-moderation or a self-hosted XLM-RoBERTa fine-tune — and that breaks the 'single-cloud single-vendor' assumption that often drives the Comprehend selection in the first place. If multilingual coverage matters to your roadmap, Comprehend Toxicity is probably not the right baseline.

What does Azure Prompt Shields do that OpenAI moderation doesn't?

Azure's Prompt Shields is a dedicated jailbreak and indirect-prompt-injection detector that ships as a first-class endpoint inside Azure Content Safety per https://learn.microsoft.com/en-us/azure/ai-services/content-safety/. You send the user prompt plus any retrieved RAG documents and it returns classifications for attempted jailbreaks and embedded prompt injections in those documents. OpenAI's moderation endpoint does not classify prompt injections — it classifies harmful content categories, which is a different problem. The two are complementary in a RAG architecture: Azure Prompt Shields catches the 'ignore previous instructions' attempt before the model sees it, OpenAI moderation catches harmful content in the user's actual request or the model's actual response. If you are building a RAG product on any cloud, Prompt Shields is one of the few capabilities you cannot easily replicate by gluing together free APIs.

Should I self-host a Hugging Face model instead of using a SaaS classifier?

Probably not, unless you have a regulatory reason, a non-English coverage gap, or a strict sub-100ms latency requirement. The cost math rarely favors self-hosting below 10M moderated events per day because a senior MLE costs $250,000-plus per year fully loaded and you need ongoing attention to keep a self-hosted moderation fleet healthy. The accuracy math also rarely favors self-hosting unless you fine-tune on real in-domain data, which is a six-month-plus project plus an ongoing labeling operation. Where self-hosting genuinely wins: HIPAA, FedRAMP High, EU government workloads, Bengali or Swahili or other languages where SaaS quality is empirically poor, and real-time game voice chat where the network round-trip to a SaaS endpoint blows your latency budget. For everything else, use OpenAI omni-moderation free and spend the saved engineering time on policy work.

How accurate are these classifiers really, and which has the best published benchmarks?

No single classifier has a clean head-to-head benchmark against all the others in 2026 — vendor numbers are not directly comparable because the test sets, taxonomies, and threshold choices differ. The most rigorous public methodology is the Dynabench adversarial evaluation behind the Hugging Face RoBERTa hate-speech model (Vidgen et al., 2021, https://arxiv.org/abs/2012.15761). Perspective publishes credible per-attribute AUC numbers on internal evaluation sets at https://developers.perspectiveapi.com/s/about-the-api-model-cards. Azure publishes the most honest transparency note at https://learn.microsoft.com/en-us/legal/cognitive-services/content-safety/transparency-note, including known demographic limitations. OpenAI references improvements in their omni-moderation announcement (https://openai.com/index/upgrading-the-moderation-api/) but doesn't publish raw numbers. AWS doesn't publish per-category accuracy at all. The only number that matters is the F1 you measure on your own labeled traffic — build a 1,000-example evaluation set and run every candidate through it.

Can I use these moderation APIs for moderating LLM outputs as well as user inputs?

Yes — and you should moderate both. Ingress filtering catches user prompts that shouldn't be processed (CSAM, attempts to extract harmful content, jailbreak attempts when paired with Azure Prompt Shields). Generation filtering catches model outputs that shouldn't be displayed (hallucinated harmful content, model failures that produce sexual or violent content despite the prompt being benign). All five classifiers in this comparison work on both — you just call the same endpoint with the LLM output text instead of the user input text. The cost implication: moderating both inputs and outputs doubles your transaction count, which matters at scale for the paid options (AWS Comprehend, Azure Content Safety) but doesn't change the bill for OpenAI moderation or Perspective non-commercial. Most production RAG systems in 2026 moderate inputs, retrieved documents, and outputs separately — three classifications per user turn.

What happens when the classifier is wrong, and how do I handle appeals?

Every moderation classifier produces false positives and false negatives — that is not a bug, it is a property of the task. False positives mean you wrongly blocked a legitimate user; false negatives mean you let harmful content through. Both have product cost. The right architecture in 2026 has three layers: an automated block at very-high confidence, a flag-for-review queue at medium confidence, and pass-through at low confidence. Every blocked user gets an appeals path — usually a 'request review' button that routes the case to your trust-and-safety team within 24-48 hours per industry norms. None of the classifiers ship this for you; you build it on top. Plan $30,000-$80,000 to build the appeals workflow and ongoing CS staffing to handle reviews. Skipping this is the most common reason moderation programs blow up publicly — users tolerate occasional mis-flags only if there is a transparent way to push back.

Which classifier is best for the EU AI Act and DSA compliance requirements?

There is no classifier that is 'EU AI Act compliant' as a product attribute — compliance is about how you deploy and document the system, not the model itself. That said, Azure AI Content Safety has the strongest public transparency documentation at https://learn.microsoft.com/en-us/legal/cognitive-services/content-safety/transparency-note, which is the closest match to the documentation requirements the AI Act imposes on providers of high-risk systems. For DSA Article 16-17 reporting (illegal content takedown, transparency reports), the classifier choice matters less than the operational logging — you need to record every moderation decision, the user, the content category, the action taken, and the appeal outcome, in a queryable form. All five classifiers in this comparison support that pattern with adequate logging; verify your audit log retention meets DSA timelines (typically 6+ months) and that your data residency aligns with the country of the user being moderated.

You now know which moderation classifier to ship. Now make every prompt your AI safety stack runs actually hit.

AI Prompt Generator builds production-ready system prompts that work across ChatGPT, Claude, Gemini, OpenAI moderation, Azure Content Safety, and every trust-and-safety pipeline in this article — so your moderation decisions get sharper signal, not generic AI fluff. Stop tweaking prompts by hand and start shipping prompts that drive measurable lift. 14-day free trial, no credit card required.

Browse all prompt tools →