OpenAI’s New o3 and o4-mini Models Show Increased Hallucination Rates

OpenAI’s Latest Reasoning Models Are Hallucinating More Than Ever

OpenAI’s recently launched o3 and o4-mini AI models — part of its new generation of reasoning systems — are generating more hallucinations than previous models, according to the company’s internal benchmarks. Despite their strong performance in areas like coding and math, both models show a worrying trend: an increase in inaccurate or fabricated responses.

These models are designed to enhance logical reasoning and problem-solving. However, internal tests reveal that o3 hallucinated 33% of the time on OpenAI’s PersonQA benchmark — twice as much as older models like o1 and o3-mini. The smaller o4-mini fared worse, hallucinating in 48% of responses.

What’s more concerning is that OpenAI currently doesn’t fully understand why hallucinations are on the rise in these advanced models. The technical report accompanying the launch suggests that as reasoning capabilities scale, the number of both correct and incorrect claims also increases — a double-edged sword for AI performance.

External research by nonprofit lab Transluce supports these findings. One example includes o3 falsely claiming it had executed code on a MacBook Pro — something it is technically unable to do. This suggests that reinforcement learning techniques used in training may be contributing to the rise in hallucinations, as noted by former OpenAI staffer Neil Chowdhury.

While some users, like those at Workera, find the o3 model useful in coding workflows, they’ve also flagged recurring hallucinations like broken or non-existent website links. This inconsistency raises concerns for industries like law or finance, where factual precision is critical.

To address accuracy issues, OpenAI has explored web search integration, which helped its GPT-4o with search capabilities achieve 90% accuracy on another benchmark, SimpleQA. Still, whether this approach will reduce hallucinations in reasoning models remains unclear.

With reasoning models increasingly central to AI development, OpenAI faces an urgent challenge. As the demand for reliable and accurate AI continues to grow, balancing intelligence with truthfulness has never been more critical.

OpenAI’s New o3 and o4-mini Models Show Increased Hallucination Rates

Related Posts

Techstars Ups Startup Funding to $220K, Aligning Closer with Y Combinator

How to Create a Website from Home Without Code in 2025

OpenAI’s New o3 and o4-mini Models Show Increased Hallucination Rates

Related Posts

Etihad Airways Loses Reverse Hijacking Case Over .AI Domain Name

SpaceX’s Upcoming GEN2 Direct-to-Cell Satellites: What to Expect in 2027

SONIQ Labs Unveils ScamBlocker Home: AI-Powered Protection for UK Households

The Surge of Deepfakes: How AI-Generated Media is Evolving and Challenging Detection

Future of Web Hosting: Trends Driving Innovation in 2025+

WP Engine Acquires Big Bite to Boost WordPress Publishing

Techstars Ups Startup Funding to $220K, Aligning Closer with Y Combinator

How to Create a Website from Home Without Code in 2025