The Rise of Synthetic Polling: Can AI Replace Traditional Surveys?

Weeks following Donald Trump’s reelection, a conference in Oxford focused on polling and forecasting for the upcoming 2024 elections drew a crowd predominantly made up of pollsters and academics. Among the presentations were insights from Aaru and Electric Twin, companies specializing in what’s termed synthetic sampling. This practice involves using large language models (LLMs) to simulate survey responses by AI agents pretending to be survey participants.

The founders of Aaru, Cameron Fink and Ned Koh, have made ambitious claims about their technology, suggesting a future where they can simulate global events and trends. Their bold predictions have prompted skepticism from many quarters, including notable statistician Nate Silver, who humorously expressed a desire to “short” the company due to doubts about the viability of its approach.

Despite a lull in attention on these synthetic sampling companies, significant developments have transpired, including Aaru reaching a staggering $1 billion valuation. While synthetic sampling is not considered a leading edge in AI advancements—particularly when compared to models that exploit software vulnerabilities—its integration into public polling is becoming increasingly evident. Reports surfaced about findings that purportedly indicate community trust in healthcare providers, based on AI-generated responses, and polling companies are beginning to blend real and synthetic data in their samples.

However, critiques have arisen regarding the validity of synthetic sampling as a replacement for traditional polling. Using LLMs to generate faux survey respondents raises concerns about inaccuracies and biases, with experts warning that they fail to capture the nuanced understanding of public opinion that traditional polls can provide. Polling serves primarily as a means of collecting authentic data, unlike synthetic sampling, which essentially mimics survey responses without any real-world input.

The conversation around the reliability and implications of synthetic sampling is nuanced. While there is some evidence illustrating that synthetic techniques can produce comparable topline survey results, there remains considerable skepticism regarding their broader accuracy. Some experts argue that while these models can replicate common trends, they may struggle with deeper insights and the heterogeneous opinions often found in public sentiment.

Industry professionals have highlighted that relying too heavily on LLMs could diminish the foundational role of direct engagement with voters. Many believe that investing in collecting representative samples, even from traditionally hard-to-reach populations, is crucial for understanding shifts in public opinion.

As synthetic sampling gains traction in the corporate world, with clients including large organizations like EY and McDonald’s, its application in the political arena remains contentious. Concerns are mounting about the potential for AI agents to infiltrate online surveys and skew results, posing challenges to the integrity of polling.

While the popularity of synthetic sampling may rise, it raises essential questions about the distinction between models and real-world polling. Many argue that, while synthetic sampling can be a useful tool, it should not supersede the importance of genuine data collection, ensuring that the voices of real people are adequately represented in political discourse.