Looking up information on Google today has become a complex experience, as users are now confronted by AI Overviews, a search feature powered by the Gemini model. Launched in 2024, AI Overviews has faced significant criticism regarding its accuracy. Although it has made strides in improvement, it operates under a low standard, as a recent analysis by The New York Times reveals that while it boasts a reported accuracy rate of 90 percent, this still translates to 10 percent incorrect answers. Given the vast scale of Google searches, this implies that hundreds of thousands of misleading results are produced every minute.
The analysis was conducted in partnership with Oumi, a startup specializing in AI model development. Oumi utilized AI tools to assess AI Overviews using the SimpleQA evaluation—a test designed to evaluate the factuality of generative AI systems. Introduced by OpenAI in 2024, SimpleQA consists of over 4,000 questions with verifiable answers, which serve to benchmark AI performance.
Oumi began its testing one year ago, during a period when the Gemini 2.5 model was the most advanced version available. At that time, the model demonstrated an accuracy of 85 percent. However, following an update to the Gemini 3 model, AI Overviews showed improvement, achieving a 91 percent accuracy rate when the questions were reassessed.
Despite this progress, the report highlighted instances where AI Overviews stumbled. For instance, when prompted to provide the date when Bob Marley’s former residence was converted into a museum, AI Overviews referenced three sources. Notably, two of these did not address the date at all, while the third—Wikipedia—offered two conflicting years, leading AI Overviews to select the incorrect one. Another prompt asking about the date of Yo-Yo Ma’s induction into the Classical Music Hall of Fame resulted in AI Overviews citing the correct website but erroneously claiming that the Hall of Fame did not exist.
As AI Overviews continues to evolve, the need for accurate information remains critical, raising concerns about the reliability of AI-generated data in a world increasingly reliant on instant search results.


