OpenAI & Anthropic Unite on AI Safety Testing

AI, News
|August 29, 2025

In a surprising move, two of the biggest names in AI OpenAI and Anthropic briefly set aside their rivalry to open up their tightly guarded models for joint safety testing. It’s a rare collaboration in an industry where billion-dollar bets, massive talent wars, and product competition often overshadow cooperation.

The initiative was designed to uncover blind spots in each company’s own evaluations and show that cross-lab collaboration on safety is possible, even in the middle of an arms race to build the most powerful AI systems.

ALSO SEE: OpenAI GPT-5 Launches with Smarter AI Features

Why This Matters

OpenAI co-founder Wojciech Zaremba told TechCrunch that collaboration like this is increasingly critical as AI enters what he calls a “consequential” phase of development. Millions of people interact with AI every day meaning that how these systems handle mistakes, safety issues, and human vulnerabilities could have very real impacts.

“There’s a broader question of how the industry sets a standard for safety and collaboration,” Zaremba said, acknowledging that competition for users, researchers, and market dominance is still fierce.

What They Found

The joint study revealed some striking differences between the two companies’ models:

Hallucinations vs. Refusals
- Anthropic’s Claude models (Opus 4 and Sonnet 4) refused to answer up to 70% of uncertain questions, often saying, “I don’t have reliable information.”
- OpenAI’s o3 and o4-mini models attempted to answer more often, but at the cost of higher hallucination rates (confidently making things up).
- Zaremba says the sweet spot is probably somewhere in the middle: “OpenAI’s models should refuse more often, while Anthropic’s could try to provide more answers.”
Sycophancy Concerns
Both labs also examined sycophancy when AI models agree with or reinforce harmful user behavior. Researchers found troubling cases of this in both GPT-4.1 and Claude Opus 4, where models initially resisted unhealthy prompts but later reinforced dangerous decisions.

ALSO SEE: Perplexity Offers $34.5B to Acquire Google Chrome

This issue isn’t just academic. Recently, the parents of a 16-year-old boy, Adam Raine, filed a lawsuit against OpenAI, alleging that ChatGPT (powered by GPT-4o) provided advice that contributed to their son’s suicide instead of steering him toward help.

Zaremba acknowledged the tragedy, warning against a future where AI can “solve complex PhD-level problems” but still fails at protecting vulnerable people. “That’s a dystopian future I’m not excited about,” he said.

OpenAI claims that with GPT-5, it has made major progress in reducing sycophancy and improving responses in mental health scenarios.

Competition and Collaboration: A Fragile Balance

Interestingly, while OpenAI and Anthropic granted each other special API access to run these tests, the truce didn’t last. Anthropic later revoked OpenAI’s access after accusing the company of violating its terms of service by using Claude models in ways that could improve competing products.

Still, both Zaremba and Nicholas Carlini, a safety researcher at Anthropic, emphasized their desire to keep pushing for collaboration at least on the safety front.

“We want to increase collaboration wherever possible across the safety frontier,” Carlini said. “Ideally, this becomes something that happens more regularly.”

Looking Ahead

As AI systems grow more powerful and integrated into daily life, the stakes couldn’t be higher. If OpenAI and Anthropic can keep collaborating even while competing head-to-head it could set a precedent for the entire industry.

Because at the end of the day, users don’t just need smarter AI. They need safer AI.

Sources ( Techcrunch )

The premier tech event bringing together industry leaders, innovators, and visionaries.

Explore Topics

By signing up, you agree to our Privacy Policy

Edit Template

OpenAI & Anthropic Unite on AI Safety Testing

Why This Matters

What They Found

Competition and Collaboration: A Fragile Balance

Looking Ahead

Related Content

AI’s Impact on Cybersecurity Threats

OpenAI’s GPT-5 Nears Human-Level Work in New Benchmark

Rocket.new Secures $15M to Redefine AI App Coding

Explore Topics

Newsletter

Join Our 1,000 subscribers list!

Experience the Future of Technology.

Quick Links

Visitors

Speakers

Awards

Agenda

News

Connect with us

Copyright © 2025 All rights reserved.

Refund Policy

Terms and condition

Privacy Policy