OpenAI’s GPT-5 Nears Human-Level Work in New Benchmark

Blog
|September 26, 2025

OpenAI has unveiled a new benchmark, GDPval, designed to measure how its AI models stack up against human professionals across economically critical industries. The test is part of OpenAI’s broader mission to track progress toward artificial general intelligence (AGI).

The findings? GPT-5 and Anthropic’s Claude Opus 4.1 are now producing work on par with industry experts in a growing number of fields.

ALSO SEE: Rocket.new Secures $15M to Redefine AI App Coding

What GDPval Measures

GDPval zeroes in on nine industries that contribute the most to the U.S. economy, from healthcare and finance to manufacturing and government. It evaluates 44 occupations by asking seasoned professionals to compare AI-generated reports with human-produced ones.

GPT-5-high — a more powerful variant of GPT-5 — matched or outperformed expert work 40.6% of the time.
Claude Opus 4.1 came in at 49%, though OpenAI suggests its higher score may be boosted by polished visuals rather than deeper performance.

For context, GPT-4o, released just 15 months ago, scored only 13.7% on the same benchmark — underscoring the pace of progress.

Why It Matters

While GDPval-v0 currently focuses on research reports, not the full spectrum of workplace tasks, OpenAI acknowledges the need for more comprehensive tests that reflect real-world workflows. Still, the results suggest that professionals — from bankers to nurses — could increasingly offload routine tasks to AI, freeing up time for higher-value work.

As OpenAI’s chief economist Dr. Aaron Chatterji put it:
“Because the model is getting good at some of these things, people in those jobs can now use the model… to offload some of their work and do potentially higher-value things.”

ALSO READ: Apple’s Local AI in iOS 26 Apps

The Bigger Picture

Traditional AI benchmarks like AIME (math) and GPQA (PhD-level science) are nearing saturation, prompting researchers to seek better measures of real-world usefulness. GDPval could become a key standard in that shift, especially as AI models inch closer to matching human productivity in high-stakes industries.

For OpenAI, the benchmark is both a progress report and a pitch: AI isn’t replacing professionals just yet but it’s rapidly becoming a powerful co-pilot.

Sources: ( Techcrunch )

The premier tech event bringing together industry leaders, innovators, and visionaries.

OpenAI’s GPT-5 Nears Human-Level Work in New Benchmark

What GDPval Measures

Why It Matters

The Bigger Picture

Related Content

AI’s Impact on Cybersecurity Threats

OpenAI’s GPT-5 Nears Human-Level Work in New Benchmark

Rocket.new Secures $15M to Redefine AI App Coding

Explore Topics

Newsletter

Join Our 1,000 subscribers list!

Experience the Future of Technology.

Quick Links

Visitors

Speakers

Awards

Agenda

News

Connect with us

Copyright © 2025 All rights reserved.

Refund Policy

Terms and condition

Privacy Policy