OpenAI has unveiled a new benchmark, GDPval, designed to measure how its AI models stack up against human professionals across economically critical industries. The test is part of OpenAI’s broader mission to track progress toward artificial general intelligence (AGI).
The findings? GPT-5 and Anthropic’s Claude Opus 4.1 are now producing work on par with industry experts in a growing number of fields.
ALSO SEE: Rocket.new Secures $15M to Redefine AI App Coding
What GDPval Measures
GDPval zeroes in on nine industries that contribute the most to the U.S. economy, from healthcare and finance to manufacturing and government. It evaluates 44 occupations by asking seasoned professionals to compare AI-generated reports with human-produced ones.
- GPT-5-high — a more powerful variant of GPT-5 — matched or outperformed expert work 40.6% of the time.
- Claude Opus 4.1 came in at 49%, though OpenAI suggests its higher score may be boosted by polished visuals rather than deeper performance.
For context, GPT-4o, released just 15 months ago, scored only 13.7% on the same benchmark — underscoring the pace of progress.
Why It Matters
While GDPval-v0 currently focuses on research reports, not the full spectrum of workplace tasks, OpenAI acknowledges the need for more comprehensive tests that reflect real-world workflows. Still, the results suggest that professionals — from bankers to nurses — could increasingly offload routine tasks to AI, freeing up time for higher-value work.
As OpenAI’s chief economist Dr. Aaron Chatterji put it:
“Because the model is getting good at some of these things, people in those jobs can now use the model… to offload some of their work and do potentially higher-value things.”
ALSO READ: Apple’s Local AI in iOS 26 Apps
The Bigger Picture
Traditional AI benchmarks like AIME (math) and GPQA (PhD-level science) are nearing saturation, prompting researchers to seek better measures of real-world usefulness. GDPval could become a key standard in that shift, especially as AI models inch closer to matching human productivity in high-stakes industries.
For OpenAI, the benchmark is both a progress report and a pitch: AI isn’t replacing professionals just yet but it’s rapidly becoming a powerful co-pilot.
Sources: ( Techcrunch )