Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Safety Features Compared: GPT-5 vs Claude Opus 4.7 vs Gemini 2.5 Pro — Refusal Calibration, Jailbreaks, Hallucinations, and Compliance Posture (2026)

Three frontier model families, three different theories of how to ship a model that does not embarrass you. OpenAI bets on RLHF plus a layered usage-policy moderation API. Anthropic bets on Constitutional AI plus its Responsible Scaling Policy. Google bets on tunable safety thresholds, ShieldGemma classifiers, and SynthID watermarking baked into the platform. Sources cited inline, June 2026.

By DDH Research Team at Digital Dashboard HubUpdated

Picking a frontier model in 2026 used to be a benchmark question. It is now a safety, compliance, and refusal-calibration question — because the underlying capability gap between GPT-5, Claude Opus 4.7, and Gemini 2.5 Pro has narrowed to the point where the model that gets deployed in regulated environments is the one whose safety stack survives the security review. Refuse too much and your product team revolts. Refuse too little and your legal team revolts. Hallucinate at the wrong rate and your customers churn. Before you sign a six-figure inference contract, walk the decision through the OpenAI vs Anthropic data policies comparison so the data-residency math survives an actual procurement review.

**OpenAI GPT-5** ships with a safety stack documented at https://openai.com/safety/, anchored on RLHF plus the layered usage-policies enforcement at https://openai.com/policies/usage-policies/. **Anthropic Claude Opus 4.7** publishes a model-specific system card and ties deployment to the Responsible Scaling Policy at https://www.anthropic.com/responsible-scaling-policy. **Google Gemini 2.5 Pro** exposes per-category tunable safety thresholds documented at https://ai.google.dev/gemini-api/docs/safety-settings, plus Vertex AI's separate safety attribute layer at https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes and SynthID image watermarking at https://deepmind.google/technologies/synthid/. All claims in this guide are sourced from vendor pages as of June 2026 — verify before procurement.

The rest of this guide breaks down what each safety stack actually does, the published benchmark data, the cloud-overlay options (Azure, Bedrock, Vertex), and which model to pick for which risk profile. You will get a decision matrix, a five-step procurement plan, and answers to the questions your security and ML teams will ask. We also compare the underlying constitutional approach in Anthropic Constitutional AI explained and the cost math against safety overhead in GPT-5 vs Claude Opus 4.7.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

GPT-5, Claude Opus 4.7, Gemini 2.5 Pro — safety stack + cloud overlay comparison (June 2026)

Feature
GPT-5 (OpenAI)
Claude Opus 4.7 (Anthropic)
Gemini 2.5 Pro (Google)
GPT-5 + Azure overlay
Claude via AWS Bedrock
Gemini via Vertex AI
Safety training methodologyRLHF + rule-based reward models + deliberative alignment per https://openai.com/safety/Constitutional AI + RLAIF + RLHF per https://www.anthropic.com/news/claude-4-7-system-cardRLHF + safety fine-tuning + ShieldGemma classifier overlay per https://ai.google.dev/gemini-api/docs/safety-settingsSame GPT-5 training + Azure Content Safety classifiers on input/outputSame Claude training + Bedrock Guardrails policy filtersSame Gemini training + Vertex AI safety attributes + Model Armor
Refusal calibrationTighter on policy categories; deliberative-alignment reduces over-refusals vs GPT-4Conservative by default; tends to add caveats over refusing outrightUser-tunable per category (BLOCK_NONE to BLOCK_LOW_AND_ABOVE) per docsAdds Azure Content Filter (Off/Low/Medium/High) on top of model refusalsBedrock Guardrails layer adds configurable policy categoriesInherits Gemini tunable thresholds + Vertex's adversarial-prompt filter
JailbreakBench score (approximate range)Strong attack-success-rate resistance vs prior generation; verify at https://jailbreakbench.github.io/Among the most resistant frontier models in 2025-26 public leaderboards; verify at https://jailbreakbench.github.io/Resistance varies by safety threshold setting; default thresholds rank competitivelyAdds external classifier layer — measured separately, generally higher resistance than raw APIInherits Claude resistance + Bedrock filter; effectively layered defenseInherits Gemini + Vertex adversarial filter; layered defense
Hallucination rate (Vectara HHEM, approximate range)Among the lowest in the leaderboard at https://huggingface.co/spaces/vectara/leaderboardAmong the lowest; Anthropic publishes hallucination evals in system cardCompetitive; verify current rank at https://huggingface.co/spaces/vectara/leaderboardNo measurable change from base GPT-5 — overlay is policy, not factualityNo measurable change from base ClaudeNo measurable change from base Gemini
Categories filtered (default)Sexual content (incl. minors), violence, self-harm, hate, weapons, illicit advice per usage policiesConstitutional categories: harm, deception, privacy, weapons, CSAM, malicious code4 categories with tunable thresholds: harassment, hate, sexual, dangerous contentAdds Azure's 4 harm categories with severity levelsAdds Bedrock denied topics, content filters, sensitive info, contextual groundingAdds Vertex safety attributes (10+ categories) plus Model Armor prompt-injection filter
Customizable safety thresholdsLimited via API; mostly system prompt + Moderation API at https://platform.openai.com/docs/guides/moderationLimited — Anthropic does not expose per-category dials; behavior tuned via system promptYes — explicit per-category enum thresholds in the API per https://ai.google.dev/gemini-api/docs/safety-settingsYes — Azure Content Safety severity per category configurableYes — Bedrock Guardrails fully configurable per policyYes — Vertex thresholds + Model Armor templates
Watermarking / provenanceC2PA metadata on DALL·E / Sora outputs; no text watermark publicly availableNo public text watermark; reliance on policy + auditingSynthID watermarking on Gemini-generated images, audio, video, and text per https://deepmind.google/technologies/synthid/Inherits OpenAI provenance; Azure adds optional audit loggingInherits Anthropic; Bedrock adds CloudTrail logsInherits SynthID; Vertex adds VPC-SC + audit logs
Opt-out of training (API by default)API and ChatGPT Enterprise/Team do NOT train on customer data per https://openai.com/enterprise-privacy/Default — Anthropic does not train on API customer data per https://www.anthropic.com/legal/commercial-termsVertex AI and paid Gemini API do not train on customer data per https://cloud.google.com/vertex-ai/generative-ai/docs/data-governanceSame as OpenAI default — no Microsoft training on customer dataNo model training on customer data per https://aws.amazon.com/bedrock/security-compliance/Same as Google default for Vertex
Default data retention (API)30 days for abuse monitoring per https://openai.com/enterprise-privacy/; ZDR available enterprise30 days standard; configurable for enterprise per https://www.anthropic.com/legal/commercial-termsGemini API caches 24 hours; Vertex configurable per project per data-governance docsConfigurable down to zero via Azure abuse monitoring opt-out (enterprise)Configurable per Bedrock log policy; logs off by defaultConfigurable per Vertex; zero-day retention available on request
Zero Data Retention (ZDR) availableYes — enterprise tier on approved use cases per https://openai.com/enterprise-privacy/Yes — enterprise tier per https://www.anthropic.com/trustYes — Vertex AI configurable zero-day retentionYes — Azure OpenAI customer-managed-key + abuse monitoring opt-outYes — default behavior for Bedrock invocationsYes — Vertex configurable
Model-specific system card publishedYes — GPT-5 system card linked from https://openai.com/safety/Yes — Claude Opus 4.7 system card per https://www.anthropic.com/news/claude-4-7-system-cardYes — Gemini 2.5 technical/safety report per https://deepmind.google/technologies/gemini/OpenAI card applies; Azure adds Responsible AI documentationAnthropic card applies; Bedrock model card provided in consoleGoogle card applies; Vertex AI model card in registry
SOC 2 / ISO 27001 / HIPAASOC 2 Type II, CSA STAR, HIPAA BAA on enterprise per https://trust.openai.com/SOC 2 Type II, ISO 27001, HIPAA BAA per https://trust.anthropic.com/SOC 2, ISO 27001/27017/27018, HIPAA (Vertex) per https://cloud.google.com/security/complianceInherits Azure's full SOC 2 + ISO + HIPAA + FedRAMP HighInherits AWS's SOC 2 + ISO + HIPAA + FedRAMP HighInherits Google Cloud's SOC 2 + ISO + HIPAA + FedRAMP High
Pre-deployment red-team / RSP tierPreparedness Framework evaluation pre-launch per https://openai.com/safety/preparedness/AI Safety Level (ASL) assessment under Responsible Scaling PolicyFrontier Safety Framework evaluation per Google DeepMind policyInherits OpenAI evals plus Microsoft Responsible AI reviewInherits Anthropic ASL plus AWS responsible AI assessmentInherits Google FSF plus Vertex Responsible AI Toolkit
Best fitTeams wanting strong default refusals + the broadest tooling ecosystemRegulated buyers prioritizing low hallucination + careful refusal postureTeams that need per-category tunable safety + watermarking out of the boxMicrosoft-stack enterprises with FedRAMP High requirementsAWS-stack enterprises wanting Claude under existing Bedrock procurementGCP-stack enterprises with sovereignty + VPC-SC requirements

Sources as of June 2026 — verify at vendor pages before procurement: https://openai.com/safety/, https://openai.com/policies/usage-policies/, https://www.anthropic.com/news/claude-4-7-system-card, https://www.anthropic.com/responsible-scaling-policy, https://ai.google.dev/gemini-api/docs/safety-settings, https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes, https://deepmind.google/technologies/synthid/. Frontier model safety posture changes frequently — confirm system card versions and policy wording in writing before any procurement decision.

What each safety stack actually does (and the marketing copy to ignore)

**OpenAI GPT-5** is the model that hardened the refusal pipeline the most between generations. The published stack at https://openai.com/safety/ describes a layered approach: RLHF on human preference data, rule-based reward models trained against the OpenAI usage policies at https://openai.com/policies/usage-policies/, and a deliberative-alignment phase where the model is taught to reason explicitly about whether a request violates policy before answering. The result is a model that refuses fewer benign requests than GPT-4 while holding the line on actually-harmful ones. The Moderation API at https://platform.openai.com/docs/guides/moderation runs separately and is free — and most teams underuse it.

**Anthropic Claude Opus 4.7** is the Constitutional AI flagship. Per the system card published at https://www.anthropic.com/news/claude-4-7-system-card and the methodology at https://www.anthropic.com/responsible-scaling-policy, the training pipeline uses a written constitution (a set of principles drawn from sources like the UN Declaration of Human Rights and Anthropic's acceptable-use policy) to generate AI feedback (RLAIF) on top of human RLHF. The practical effect is a model that tends to add caveats and reasoning rather than refuse outright — Claude is famously the model most likely to explain why it cannot help and offer a constrained alternative, rather than returning a flat "I can't do that."

**Google Gemini 2.5 Pro** is the most explicitly tunable safety stack of the three. The Gemini API at https://ai.google.dev/gemini-api/docs/safety-settings exposes four safety categories — harassment, hate speech, sexually explicit content, and dangerous content — each with five threshold levels from BLOCK_NONE to BLOCK_LOW_AND_ABOVE. Vertex AI layers a second filter pass per https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes, and Google's SynthID watermarking at https://deepmind.google/technologies/synthid/ marks generated images, audio, video, and (in supported regions) text with imperceptible signals that can be detected by Google's classifier.

Where the marketing copy diverges from reality: all three vendors describe their models as "aligned" and "safe" in ways that suggest the underlying training closed the problem. It did not. Jailbreaks work, hallucinations happen, and the published benchmarks at sites like https://jailbreakbench.github.io/ and https://huggingface.co/spaces/vectara/leaderboard show the gap between vendor claims and red-team reality. The right mental model is layered defense — model training plus moderation API plus your own input/output filters — not "the model is safe, ship it."

Where the marketing copy is fair: all three vendors publish meaningful system cards and red-team evaluations. OpenAI publishes the Preparedness Framework results at https://openai.com/safety/preparedness/. Anthropic publishes ASL-level evaluations under the Responsible Scaling Policy. Google publishes Frontier Safety Framework assessments. None of these are marketing fluff — they are real documents that your security team should read before signing a contract. Skip the blog posts. Read the system cards.

The opinionated read: GPT-5 has the most polished refusal calibration out of the box, Claude Opus 4.7 has the lowest hallucination rate on the Vectara leaderboard most months, and Gemini 2.5 Pro is the only one of the three that lets a developer explicitly dial safety thresholds per category at the API level. Which matters most depends on your use case — and that is what the rest of this guide is about.


Architecture: how the safety layer plugs into your application

**OpenAI GPT-5** integration is the simplest of the three. You call the chat completions endpoint, the model applies its trained safety policy, and you get an answer or a refusal. For an extra layer, you call the free Moderation API on user input before sending it to the model, and optionally on model output before returning it to the user. The Moderation API at https://platform.openai.com/docs/guides/moderation classifies text against 13 categories and is materially better than rolling your own classifier. Most teams skip it because it adds a round trip — that is a mistake on a serious production deployment.

**Anthropic Claude Opus 4.7** integration adds no API-level safety dials. You send a request, the model applies its constitutional training, and you get a response. Tuning is done through the system prompt — Claude is unusually responsive to instructions about tone, format, and what to refuse — and through retrieval-time controls (don't feed it documents you don't want it to discuss). The Bedrock deployment per https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html adds AWS Bedrock Guardrails as an external policy layer, which is the closest equivalent to Gemini's tunable thresholds.

**Google Gemini 2.5 Pro** is the most architecturally distinct. The Gemini API request body accepts a safety_settings array where each entry specifies a category and a threshold per https://ai.google.dev/gemini-api/docs/safety-settings. Setting BLOCK_NONE turns the filter off for that category (useful for, say, a medical app that needs to discuss explicit anatomy) — but the underlying model still has its base training to fall back on. On Vertex AI, you get an additional safety_attributes response field showing confidence scores per category, and Model Armor (https://cloud.google.com/security-command-center/docs/model-armor-overview) can be configured to filter prompt injection attempts before they hit the model.

Cloud overlay matters more than most procurement teams realize. **GPT-5 on Azure OpenAI** at https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter adds Azure Content Safety with four severity levels (Safe, Low, Medium, High) configurable per category, plus a separate jailbreak-risk detector and prompt-shield feature. **Claude on AWS Bedrock** adds Bedrock Guardrails with denied topics, content filters, PII redaction, and contextual grounding checks per https://aws.amazon.com/bedrock/guardrails/. **Gemini on Vertex AI** adds the safety attributes layer plus VPC Service Controls plus customer-managed encryption keys for regulated buyers.

The integration trade-off is real: native APIs are cheaper and faster, cloud-overlaid deployments are more configurable and audit-friendly but add latency and per-token cost. A regulated buyer in financial services or healthcare almost always wants the cloud overlay. A startup shipping a consumer chatbot almost always wants the native API. The middle case — a mid-market SaaS shipping AI features to enterprise customers — usually ends up running the native API for speed but adding their own input/output classifier for audit trail.

The one architectural pattern that nobody documents well but everyone needs in production: log the safety-related metadata. OpenAI's response includes a finish_reason that flags safety-driven stops. Anthropic's response includes a stop_reason. Gemini's response includes the per-category safety_ratings. Capture these in your observability stack alongside latency and token counts — your incident response on a real safety event will be impossible without them, and your post-mortem will need them to retrain prompts.


Benchmark deep-dive: refusals, jailbreaks, hallucinations

The honest version of frontier-model benchmarking in 2026 is that the published numbers move every month, and the right move is to read the leaderboards directly rather than trust any single vendor's blog post. The three benchmarks worth tracking are JailbreakBench at https://jailbreakbench.github.io/ (adversarial attack success rate), Vectara's HHEM leaderboard at https://huggingface.co/spaces/vectara/leaderboard (hallucination rate on summarization), and HELM Safety at https://crfm.stanford.edu/helm/ (broad multi-dimensional safety scoring).

**Refusal calibration** is best measured by XSTest (https://github.com/paul-rottger/exaggerated-safety) and the OR-Bench suite — these measure how often a model refuses benign requests that merely mention sensitive topics. GPT-5 ships with materially lower over-refusal rates than GPT-4 thanks to the deliberative-alignment work documented in OpenAI's safety updates. Claude Opus 4.7 historically over-refuses in some categories (anything touching medical advice or legal questions) but the 4.7 update narrowed the gap. Gemini 2.5 Pro's over-refusal rate is highly sensitive to the threshold setting — a default deployment refuses more than a tuned one.

**Jailbreak resistance** is best measured by JailbreakBench's attack success rate. The Anthropic team has historically scored well on adversarial robustness — the Constitutional AI training makes Claude unusually hard to talk into producing harmful content via prompt engineering. GPT-5 is competitive; the deliberative-alignment training closed much of the gap. Gemini 2.5 Pro at default thresholds is in the same neighborhood; at BLOCK_NONE thresholds it predictably degrades. The cloud-overlay deployments — Azure Content Safety, Bedrock Guardrails, Vertex Model Armor — generally improve attack success rates because they add an external classifier the model itself does not see.

**Hallucination rate** is best measured by Vectara's HHEM leaderboard at https://huggingface.co/spaces/vectara/leaderboard. As of mid-2026, the three frontier models — GPT-5, Claude Opus 4.7, and Gemini 2.5 Pro — all sit in the top tier of the leaderboard with hallucination rates substantially lower than open-source alternatives. The month-to-month rank changes; the practical difference for a production RAG application is small. What matters more is grounding (feed the model the actual document) and prompt design (tell it explicitly to cite or refuse) than the small inter-model differences in raw hallucination rate.

**Multi-turn jailbreaks** are the underreported problem. Single-turn attack success rates have dropped for all three models. Multi-turn attacks — where the adversary builds rapport over five or ten turns before pivoting to the harmful request — remain materially more effective than single-turn attacks. None of the three vendors have published comprehensive multi-turn benchmark data as of June 2026. If your application keeps long conversations, this is the failure mode you should red-team yourself before launch.

The right way to use benchmarks: read them to compare deltas between models, not absolute numbers. The absolute jailbreak rate on a published benchmark is a snapshot of a specific attack methodology against a specific model version. By the time you read it, both have moved. The relative rank between GPT-5, Claude, and Gemini is more durable than the absolute scores. And for your specific use case, build your own internal eval set with prompts from your actual user base — that is the only number that predicts your production failure rate.


Refusal calibration and prompt design: how to ship something useful

Refusal calibration is where the three models genuinely differ for everyday developers. **GPT-5** sits at a thoughtful middle — it will discuss legal scenarios, explain how malware works conceptually, and walk through medical symptoms in clinical depth. It draws hard lines on CSAM, explicit weapons synthesis, and self-harm encouragement, with the policy detail documented at https://openai.com/policies/usage-policies/. For most B2B SaaS applications, GPT-5's defaults are close to what you would have tuned by hand.

**Claude Opus 4.7** is the model most likely to add a thoughtful caveat where another model would just answer. This is a feature for legal, compliance, and healthcare applications where the caveat is the point — and a friction point for casual consumer use where users want a direct answer. The Constitutional AI training, documented at https://www.anthropic.com/news/claude-4-7-system-card, is also the reason Claude is unusually good at refusing in ways that explain the reasoning and offer a partial answer, which most users find less frustrating than a flat refusal.

**Gemini 2.5 Pro** is the only one where the system-prompt-plus-safety-settings combination genuinely changes the model's behavior on the wire. Setting safety thresholds to BLOCK_NONE on a category that does not apply to your application (say, harassment for a code-generation product) removes the refusal layer for that category entirely — but the underlying training still applies, so you do not get a fully unfiltered model. The configuration syntax is documented at https://ai.google.dev/gemini-api/docs/safety-settings and is the easiest to test against your use case.

Prompt design that reduces over-refusals across all three models: state the use case explicitly in the system prompt ("You are a legal research assistant helping a licensed attorney prepare a case."), give the model permission to discuss sensitive topics with clinical detail ("You may explain medication interactions in clinical depth; the user is a licensed pharmacist."), and tell the model what to do when it would otherwise refuse ("If you cannot answer, return JSON with reason='policy' and a brief explanation."). All three models respond meaningfully to this framing.

Prompt design that reduces jailbreak success: never put untrusted user content directly into the system prompt, always wrap user inputs in clearly delimited tags (Anthropic recommends XML-style tags per https://docs.anthropic.com/), and treat the user message as data that the system prompt operates on rather than as continuation of the system prompt. This pattern has the biggest single effect on prompt-injection vulnerability across all three vendors. For Gemini, additionally enable Model Armor on Vertex if your deployment supports it.

The middle-ground default for production applications: GPT-5 for general-purpose B2B SaaS with broad use cases, Claude Opus 4.7 for regulated workflows where careful reasoning beats raw speed, and Gemini 2.5 Pro for applications that genuinely need per-category tunability — content moderation backends, creative writing tools, medical or legal apps where you have a credentialed audience. Verify the current behavior against your own eval set before committing — and rerun the eval each time a vendor pushes a model update.


Watermarking, provenance, and content authenticity

**SynthID** is the single most differentiated safety feature on this list. Google DeepMind's SynthID at https://deepmind.google/technologies/synthid/ embeds an imperceptible watermark into Gemini-generated images, audio, video, and (in supported regions) text. The watermark survives moderate transformations — cropping, resaving, color adjustment — and can be detected by Google's classifier. For applications shipping AI-generated media at scale, this is meaningful provenance. For applications generating short-form text, the text watermark coverage is narrower and detection is probabilistic, not deterministic.

**OpenAI** uses C2PA metadata on DALL·E and Sora outputs per https://help.openai.com/en/articles/8912793-c2pa-in-dall-e-3, which is the industry standard for image provenance metadata. C2PA is signed metadata attached to the file, not an embedded watermark — it survives transmission but is trivial to strip with a re-save through a tool that does not preserve metadata. As of June 2026, OpenAI has not shipped a public text watermarking product despite multiple research previews. For text provenance, you are relying on policy and audit, not technical detection.

**Anthropic** has not shipped a watermarking system as of June 2026. The Anthropic position, documented in Responsible Scaling Policy materials at https://www.anthropic.com/responsible-scaling-policy, leans on policy enforcement, system-card transparency, and red-team evaluation rather than technical watermarking. For applications that need detectable AI-generated content as a compliance requirement, Claude is not the strongest fit out of the box — you would need to layer a third-party detection system, which is far less reliable than embedded watermarking.

The practical implication for content-heavy applications: if you are shipping AI-generated images, audio, or video at scale and you need defensible provenance for compliance, regulatory, or brand-safety reasons, **Gemini 2.5 Pro with SynthID is the strongest choice** in the frontier-model tier. If you are shipping primarily text and your provenance need is met by logging the model version and prompt, GPT-5 and Claude are both acceptable. Do not buy any vendor's claim that text watermarking is solved — none of the published research detection systems are reliable enough for legal evidentiary use.

Provenance also matters for inbound content. If your application accepts user uploads — images, audio, documents — and you want to flag AI-generated material in the input, the SynthID detector at https://deepmind.google/technologies/synthid/ detects Google-generated content with high reliability and other AI-generated content with much lower reliability. C2PA-marked content from OpenAI or Adobe tools can also be detected via the metadata. There is no universal AI-content detector that works reliably across all upstream models in 2026, despite vendor claims.

The opinionated take on watermarking in 2026: it is a real differentiator for Gemini and a real gap for Anthropic. If watermarking is in your requirements document, that decision is already made. If it is not in your requirements document, do not let a vendor sales pitch put it there — text watermarking specifically is more marketing than science right now, and over-indexing on it is a procurement mistake.


Data retention, ZDR, and training opt-out

All three vendors have converged on the same baseline for paid API and enterprise tiers: no model training on customer data by default. **OpenAI** confirms this at https://openai.com/enterprise-privacy/ and https://openai.com/api-data-privacy/. **Anthropic** confirms at https://www.anthropic.com/legal/commercial-terms. **Google** confirms at https://cloud.google.com/vertex-ai/generative-ai/docs/data-governance. This is the baseline; verify the current version of each vendor's terms before signing, because the consumer tier rules differ materially and the boundary between API and consumer products is fuzzy at OpenAI and Google in particular.

**Default retention** differs. OpenAI retains API data for up to 30 days for abuse monitoring per https://openai.com/enterprise-privacy/, then deletes. Anthropic retains for up to 30 days. Gemini API caches inputs for short windows (typically 24 hours); Vertex AI retention is configurable per project. None of these are training data — they are abuse-monitoring retention, which is a different question from whether your data ever ends up in a model.

**Zero Data Retention (ZDR)** is available on enterprise tiers from all three vendors. OpenAI ZDR per https://openai.com/enterprise-privacy/ requires approval and is typically gated on specific use cases. Anthropic ZDR is available per https://www.anthropic.com/trust on enterprise contracts. Vertex AI supports zero-day retention configuration directly. Through cloud overlays, **GPT-5 on Azure** supports abuse-monitoring opt-out for approved customers, **Claude on Bedrock** is essentially zero-retention by default (AWS does not log invocations unless you turn logging on), and **Gemini on Vertex** is fully configurable.

For regulated buyers, the practical procurement question is not just "do you have ZDR" — it is "what is the audit trail that proves you have ZDR." All three cloud overlays produce stronger audit trails than the native APIs. Azure adds Azure Monitor logs, Bedrock adds CloudTrail, Vertex adds Cloud Audit Logs. If your security review requires evidence that customer data did not leave your tenancy, the cloud overlay is almost always the right answer over the native API — even at the modest latency and cost premium.

The undocumented procurement question that catches buyers: data residency commitments are usually contract-level, not API-level, for the native APIs. **OpenAI** does not let you choose region on api.openai.com calls — you get OpenAI's default routing. **Anthropic** is the same on api.anthropic.com. **Gemini API** has limited region selection at the developer-tier level. To get a hard data-residency commitment (EU-only, US-only, AU-only), you almost always need the cloud-overlay deployment: Azure OpenAI's regional resources, Bedrock's regional endpoints, or Vertex AI's regional configuration. This is the single biggest reason regulated buyers move from native API to cloud overlay.

The opinionated read on retention and training: the headline numbers (no training, 30-day retention) are now table stakes across all three vendors. The differentiation is in the audit, contract, and residency story — and that story almost always points toward the cloud overlays rather than the native APIs for any serious regulated deployment. Walk through the OpenAI vs Anthropic data policies comparison before signing — the contract language matters more than the marketing page.


Compliance posture: SOC 2, ISO, HIPAA, EU AI Act

**OpenAI** publishes its trust portal at https://trust.openai.com/ with SOC 2 Type II, CSA STAR Level 1, and HIPAA BAA available on the enterprise tier. The ChatGPT Enterprise and API enterprise tiers are SOC 2 Type II covered; the consumer tier is not part of the same scope. The HIPAA BAA is available but limited to specific approved use cases — you cannot self-serve a BAA on the standard developer dashboard.

**Anthropic** publishes at https://trust.anthropic.com/ with SOC 2 Type II, ISO 27001, and HIPAA BAA on enterprise contracts. Anthropic's compliance posture has matured rapidly through 2024-2026 and is now comparable to OpenAI's. The Responsible Scaling Policy at https://www.anthropic.com/responsible-scaling-policy adds an unusual dimension — Anthropic publicly commits to pre-deployment safety evaluations at each ASL tier, which is a different kind of compliance signal than SOC 2 but increasingly valued by enterprise buyers.

**Google** publishes at https://cloud.google.com/security/compliance with the deepest certification stack of the three — SOC 2, SOC 3, ISO 27001/27017/27018/27701, HIPAA, FedRAMP High (Vertex), PCI DSS, and dozens of country-specific certifications. This is partly a function of Google Cloud being a long-established cloud provider — Gemini inherits the existing GCP compliance umbrella, which is the most mature of the three.

The cloud-overlay compliance picture is stronger across the board. **GPT-5 on Azure OpenAI** inherits Azure's full compliance stack including FedRAMP High and IL5 for US government use cases. **Claude on AWS Bedrock** inherits AWS's full stack including FedRAMP High and IL5. **Gemini on Vertex AI** inherits Google Cloud's full stack. For US government, regulated finance, or healthcare workloads, the cloud overlay is the right deployment regardless of which model you pick — you cannot get FedRAMP High through the native API of any frontier vendor.

**EU AI Act** compliance is the moving target. As of June 2026, the EU AI Act's general-purpose AI (GPAI) provisions are in effect for new models, and high-risk system requirements are progressively rolling in through 2026-2027. All three vendors are GPAI providers and publish model documentation aligned with the GPAI requirements. Whether your deployment is high-risk under the Act depends on your use case, not the vendor — a customer-service chatbot is not high-risk, an automated employment screening tool is. The vendor compliance posture is necessary but not sufficient; your deployment classification is yours to manage.

The practical compliance procurement checklist: get the most recent SOC 2 Type II report (not Type I, not a SOC 3 summary), get the HIPAA BAA in writing if you handle PHI, get the data residency commitment in the master services agreement (not the marketing page), and get the EU AI Act GPAI documentation if you deploy in the EU. For all three vendors as of June 2026, the answers exist — the procurement question is whether your counsel and security team have read them and confirmed they meet your specific obligations. Do not assume; verify.


The opinionated 2026 pick: which safety stack to deploy

If I were shipping a general-purpose B2B SaaS feature tomorrow with no regulatory constraints, I would deploy **GPT-5 via native API** with the free Moderation API on input and output. The refusal calibration is the most polished out of the box, the tooling ecosystem is the broadest, and the cost-per-token is competitive. Verify current pricing at https://openai.com/api/pricing/ before procurement.

If I were shipping a regulated workflow in legal, healthcare, or financial services where careful refusal posture beats raw output speed, I would deploy **Claude Opus 4.7 via AWS Bedrock**. The Constitutional AI training produces a careful, caveated voice that matches what regulated buyers expect, the Bedrock Guardrails overlay handles the configurable policy layer Anthropic does not expose natively, and AWS compliance inheritance covers FedRAMP, HIPAA, and IL5. Verify Bedrock pricing at https://aws.amazon.com/bedrock/pricing/ and Claude availability per region.

If I were shipping a content-heavy application where AI-generated images, audio, or video provenance is part of the requirement — say, a generative media platform, a publishing tool, or a content moderation backend — I would deploy **Gemini 2.5 Pro via Vertex AI**. SynthID is the only credible watermarking system in the frontier tier, the per-category tunable thresholds let you ship a product that does not over-refuse on your specific use case, and Vertex's compliance inheritance is the deepest of the three cloud overlays. Verify at https://cloud.google.com/vertex-ai/pricing.

If I were running a multi-model production stack — increasingly common in 2026 as teams route different request types to different models — I would deploy all three behind a routing layer with consistent input/output logging and a shared moderation pre-filter. The marginal cost of supporting all three providers is modest; the resilience benefit when one provider has a quality regression or a capacity event is significant. Most teams I have seen survive the move from single-vendor to multi-vendor without serious operational burden.

What I would not do in 2026: skip the cloud overlay on a regulated deployment to save a few cents per thousand tokens. The audit trail and residency commitment alone are worth the premium, and the additional policy layer (Azure Content Safety, Bedrock Guardrails, Vertex Model Armor) materially improves your defense-in-depth posture against jailbreaks and prompt injection. The native APIs are the right answer for prototyping and consumer products. The cloud overlays are the right answer for production B2B and regulated workloads.

The one persistent mistake across all three vendors: teams treat "the model is aligned" as a substitute for product-level safety design. It is not. The model handles the model layer. You still own input validation, output filtering, abuse rate limiting, conversation length policy, and user reporting workflows. The best frontier model in 2026 is the one whose training plus cloud overlay plus your own application-layer safety design adds up to a system you can defend in a post-incident review — not the one with the best published benchmark score.

How to pick the right safety stack between GPT-5, Claude Opus 4.7, and Gemini 2.5 Pro

  1. 1

    Step 1: Write the safety requirements before you take a vendor demo

    Before you let an OpenAI, Anthropic, or Google sales engineer pitch you, write a one-page safety requirements doc. It should cover: what categories must be filtered (CSAM, weapons, self-harm, PII, brand-safety topics), what categories must NOT be over-filtered (your legitimate use case), what data residency you need (US-only, EU-only, multi-region), whether ZDR is mandatory, whether watermarking is mandatory, what audit log retention is required, and what your incident response process looks like. Without this doc you will buy whatever the most polished vendor pitch is, and you will discover the requirements gap in production. Sources to draft from include https://openai.com/policies/usage-policies/, https://www.anthropic.com/responsible-scaling-policy, and https://ai.google.dev/gemini-api/docs/safety-settings.

  2. 2

    Step 2: Build a 50-prompt eval set from your actual user behavior

    Vendor-published benchmark numbers are useful for comparing deltas but useless for predicting your production failure rate. Pull 50 representative prompts from your actual user logs (or your closest proxy for them), include 5-10 deliberately adversarial prompts that test the categories you care about most, and run each prompt against GPT-5, Claude Opus 4.7, and Gemini 2.5 Pro. Score each response on (a) was the answer correct, (b) was the refusal appropriate, (c) did the model add unnecessary caveats, and (d) did it produce content you would not want logged. This 4-hour exercise produces more procurement signal than any vendor demo. Repeat it after every major model version bump.

  3. 3

    Step 3: Pressure-test the data, retention, and residency story in writing

    Get the most recent SOC 2 Type II report (verify it is Type II, not Type I, and is dated within the last 12 months), the data processing agreement, the ZDR addendum if applicable, the data residency commitment, and the HIPAA BAA if you handle PHI. For OpenAI, verify enterprise-tier privacy at https://openai.com/enterprise-privacy/ matches your contract. For Anthropic, verify https://www.anthropic.com/trust commitments are in your MSA. For Google, verify the Vertex AI data governance commitments are bound to your specific project. Get all of this BEFORE the contract is signed — the vendor leverage drops sharply post-signature.

  4. 4

    Step 4: Pick the right deployment mode (native API vs cloud overlay)

    Native APIs (api.openai.com, api.anthropic.com, generativelanguage.googleapis.com) are faster, cheaper, and easier to integrate — and the right answer for prototyping, consumer products, and non-regulated B2B SaaS. Cloud overlays (Azure OpenAI, AWS Bedrock, Vertex AI) add latency and cost but provide the audit trail, regional residency, additional policy filters, and FedRAMP/IL5 inheritance that regulated workloads require. For most teams, the right answer is to prototype on the native API and migrate to the cloud overlay before going to production with a regulated customer. Plan the migration path before you sign, not after, since switching prompt-engineering patterns and SDKs mid-flight is a real engineering tax.

  5. 5

    Step 5: Ship safety logging and incident response on day one

    On day one of your production deployment, log the safety-related metadata from every model response: finish_reason (OpenAI), stop_reason (Anthropic), safety_ratings per category (Gemini), and the input/output token counts. Build a dashboard that surfaces refusal rate, jailbreak detection events, and toxicity flags by user and by prompt template. Build a runbook for what happens when a safety incident is reported: who pulls the logs, who decides whether to disclose, who updates the system prompt, who notifies the vendor. Most teams ship the AI feature, skip the safety telemetry, and then discover during their first incident that they have no visibility into what went wrong. Treat safety logging as a launch requirement, not a Q3 fast-follow.

Use the data programmatically

Every page on this site is also exposed as a free, CORS-open JSON endpoint. No auth, no rate limit (fair-use, please cache). License is CC-BY-4.0 — link back to attribution.canonicalUrl in the response.

Endpoint: https://aipromptshub.co/api/vs/safety-features-gpt-vs-claude-vs-gemini
curl
curl -s 'https://aipromptshub.co/api/vs/safety-features-gpt-vs-claude-vs-gemini' | jq .
Python
import requests

r = requests.get("https://aipromptshub.co/api/vs/safety-features-gpt-vs-claude-vs-gemini", timeout=10)
r.raise_for_status()
data = r.json()
print(data["title"])
for source in data.get("sources", []):
    print("source:", source)
JavaScript / Node
// Node 20+ / modern browser
const res = await fetch("https://aipromptshub.co/api/vs/safety-features-gpt-vs-claude-vs-gemini");
if (!res.ok) throw new Error("HTTP " + res.status);
const safety_features_gpt_vs_claude_vs_gemini = await res.json();
console.log(safety_features_gpt_vs_claude_vs_gemini.title);
for (const source of safety_features_gpt_vs_claude_vs_gemini.sources ?? []) {
  console.log("source:", source);
}

Spec: /api/openapi.yaml · Docs: /api/docs

Frequently Asked Questions

Which model has the lowest hallucination rate in 2026 — GPT-5, Claude Opus 4.7, or Gemini 2.5 Pro?

All three sit in the top tier of the Vectara HHEM leaderboard at https://huggingface.co/spaces/vectara/leaderboard, and the month-to-month rank changes. Claude Opus 4.7 and GPT-5 have traded the top spot in 2025-26 leaderboard snapshots, with Gemini 2.5 Pro close behind. For a production RAG application, the practical difference between the three is small — what matters more is grounding the model in retrieved documents and prompting it explicitly to cite sources or refuse. Verify the current leaderboard rank against your own evaluation set, because the published benchmark is a snapshot of summarization-style hallucination that may not reflect your specific use case.

Can I turn off safety filters on Gemini 2.5 Pro for legitimate adult-audience applications?

Partially. The Gemini API at https://ai.google.dev/gemini-api/docs/safety-settings lets you set per-category thresholds down to BLOCK_NONE for harassment, hate, sexually explicit, and dangerous content — but the underlying model still has its base training, so you do not get a fully unfiltered model. For medical, legal, or adult-platform applications that need to discuss explicit content with credentialed users, BLOCK_NONE on the relevant category plus a clear system prompt explaining the use case is the right pattern. GPT-5 and Claude Opus 4.7 do not expose equivalent per-category dials — you tune behavior through the system prompt only, and the model's underlying refusals are harder to negotiate around.

Does OpenAI, Anthropic, or Google train on my API data by default in 2026?

No. As of June 2026, all three vendors confirm they do not train on paid API or enterprise customer data by default. Verify at https://openai.com/enterprise-privacy/, https://www.anthropic.com/legal/commercial-terms, and https://cloud.google.com/vertex-ai/generative-ai/docs/data-governance. Note that consumer-tier products (ChatGPT free, Gemini consumer app) follow different rules and may use conversations for training unless you opt out in settings. The boundary between API and consumer products is not always crisp at OpenAI and Google — verify the specific product surface you are using is on the API/enterprise rules, not the consumer ones.

What is Zero Data Retention (ZDR) and which vendors offer it?

ZDR means the vendor does not store any inputs or outputs from your API calls — there is no abuse-monitoring retention window, no logs, no cache. OpenAI offers ZDR on the enterprise tier for approved use cases per https://openai.com/enterprise-privacy/. Anthropic offers ZDR on enterprise contracts per https://www.anthropic.com/trust. Vertex AI supports zero-day retention configuration directly. Through cloud overlays, GPT-5 on Azure supports abuse-monitoring opt-out, Claude on Bedrock is effectively zero-retention by default (AWS does not log invocations unless you turn logging on), and Gemini on Vertex is fully configurable. ZDR is necessary for some HIPAA and EU regulated workloads — get it in the contract, not just the marketing page.

Is SynthID watermarking actually reliable for detecting Gemini-generated content?

Yes for images, audio, and video; less so for text. SynthID image watermarking at https://deepmind.google/technologies/synthid/ embeds an imperceptible signal that survives moderate transformations (cropping, resaving, color adjustment) and is detected with high reliability by Google's classifier. SynthID text watermarking is available in supported regions but coverage is narrower and detection is probabilistic, not deterministic — short outputs especially are difficult to watermark reliably. For applications that need defensible AI-content provenance, SynthID image/audio/video is the strongest option in the frontier-model tier; for text provenance, no vendor as of June 2026 has shipped a system reliable enough for legal evidentiary use.

How do I get HIPAA coverage for Claude or Gemini in healthcare workflows?

For Claude, get the HIPAA BAA on an Anthropic enterprise contract per https://www.anthropic.com/trust, or deploy via AWS Bedrock under your existing AWS BAA per https://aws.amazon.com/compliance/hipaa-compliance/. For Gemini, deploy via Vertex AI under your Google Cloud BAA per https://cloud.google.com/security/compliance/hipaa-compliance — the consumer Gemini API and free tier are not HIPAA-covered. For GPT-5, the OpenAI enterprise tier supports a BAA per https://openai.com/enterprise-privacy/, or deploy via Azure OpenAI under your Microsoft BAA. The cloud-overlay path is usually faster procurement for healthcare buyers because the BAA is already in place from the underlying cloud relationship — verify the specific service is in scope of your cloud BAA before assuming.

Should I trust vendor jailbreak benchmark numbers or run my own evaluations?

Both. Vendor-published numbers are useful for understanding the methodology and comparing relative resistance between models, but they are snapshots against specific attack methodologies that adversaries iterate on quickly. Read JailbreakBench at https://jailbreakbench.github.io/ and HELM Safety at https://crfm.stanford.edu/helm/ for independent leaderboards. Then build your own internal red-team set with prompts from your actual user base — including multi-turn jailbreak attempts, which the public benchmarks underweight. Run your internal eval against each candidate model and against the cloud overlay version (Azure Content Safety, Bedrock Guardrails, Vertex Model Armor). The internal eval is the only number that predicts your production failure rate.

Which deployment mode is best for EU AI Act compliance — native API or cloud overlay?

Cloud overlay, in almost every case. The EU AI Act's general-purpose AI (GPAI) provisions apply to the underlying model regardless of deployment, and all three vendors publish GPAI documentation. But the high-risk system requirements that may apply to your specific deployment require granular control over data residency, audit logging, and risk-management documentation — which the cloud overlays (Azure OpenAI EU regions, Bedrock EU regions, Vertex AI EU regions) provide far more cleanly than the native APIs. For most EU regulated workloads, the answer is deploy via the cloud overlay with EU regional resources, document the deployment under your Article 6/9/10 obligations, and verify residency and ZDR in the contract. The native API is acceptable for non-high-risk EU deployments but harder to defend in audit.

What is the single biggest mistake teams make when picking a safety stack?

Treating model-level safety as a substitute for application-level safety design. The model handles the model layer — refusing CSAM, blocking weapons synthesis, applying its trained policy. Your application still owns input validation, output filtering, rate limiting, conversation length policy, abuse reporting, user identity verification for sensitive use cases, and incident response. The teams that get burned in production almost always picked a well-aligned model and assumed that was the whole job. The opposite mistake — picking a less-aligned model and over-engineering the application layer — is rare but more recoverable. Pick a strong model, deploy the right cloud overlay for your risk profile, and budget for the application-layer work as a launch requirement, not a fast-follow.

You now know which frontier-model safety stack to deploy. Now make every prompt those models run actually hit.

AI Prompt Generator builds production-ready system prompts that work across GPT-5, Claude Opus 4.7, Gemini 2.5 Pro, and every other model in this article — so your safety reviews, red-team evals, and compliance reports get sharper data, not generic AI fluff. Stop tweaking prompts by hand and start shipping prompts that drive measurable lift. 14-day free trial, no credit card required.

Browse all prompt tools →