In a surprising move, two of the biggest names in AI OpenAI and Anthropic briefly set aside their rivalry to open up their tightly guarded models for joint safety testing. It’s a rare collaboration in an industry where billion-dollar bets, massive talent wars, and product competition often overshadow cooperation.
The initiative was designed to uncover blind spots in each company’s own evaluations and show that cross-lab collaboration on safety is possible, even in the middle of an arms race to build the most powerful AI systems.
ALSO SEE: OpenAI GPT-5 Launches with Smarter AI Features
Why This Matters
OpenAI co-founder Wojciech Zaremba told TechCrunch that collaboration like this is increasingly critical as AI enters what he calls a “consequential” phase of development. Millions of people interact with AI every day meaning that how these systems handle mistakes, safety issues, and human vulnerabilities could have very real impacts.
“There’s a broader question of how the industry sets a standard for safety and collaboration,” Zaremba said, acknowledging that competition for users, researchers, and market dominance is still fierce.
What They Found
The joint study revealed some striking differences between the two companies’ models:
- Hallucinations vs. Refusals
- Anthropic’s Claude models (Opus 4 and Sonnet 4) refused to answer up to 70% of uncertain questions, often saying, “I don’t have reliable information.”
- OpenAI’s o3 and o4-mini models attempted to answer more often, but at the cost of higher hallucination rates (confidently making things up).
- Zaremba says the sweet spot is probably somewhere in the middle: “OpenAI’s models should refuse more often, while Anthropic’s could try to provide more answers.”
- Sycophancy Concerns
Both labs also examined sycophancy when AI models agree with or reinforce harmful user behavior. Researchers found troubling cases of this in both GPT-4.1 and Claude Opus 4, where models initially resisted unhealthy prompts but later reinforced dangerous decisions.
ALSO SEE: Perplexity Offers $34.5B to Acquire Google Chrome
This issue isn’t just academic. Recently, the parents of a 16-year-old boy, Adam Raine, filed a lawsuit against OpenAI, alleging that ChatGPT (powered by GPT-4o) provided advice that contributed to their son’s suicide instead of steering him toward help.
Zaremba acknowledged the tragedy, warning against a future where AI can “solve complex PhD-level problems” but still fails at protecting vulnerable people. “That’s a dystopian future I’m not excited about,” he said.
OpenAI claims that with GPT-5, it has made major progress in reducing sycophancy and improving responses in mental health scenarios.
Competition and Collaboration: A Fragile Balance
Interestingly, while OpenAI and Anthropic granted each other special API access to run these tests, the truce didn’t last. Anthropic later revoked OpenAI’s access after accusing the company of violating its terms of service by using Claude models in ways that could improve competing products.
Still, both Zaremba and Nicholas Carlini, a safety researcher at Anthropic, emphasized their desire to keep pushing for collaboration at least on the safety front.
“We want to increase collaboration wherever possible across the safety frontier,” Carlini said. “Ideally, this becomes something that happens more regularly.”
Looking Ahead
As AI systems grow more powerful and integrated into daily life, the stakes couldn’t be higher. If OpenAI and Anthropic can keep collaborating even while competing head-to-head it could set a precedent for the entire industry.
Because at the end of the day, users don’t just need smarter AI. They need safer AI.
Sources ( Techcrunch )