Reddit has initiated a copyright lawsuit against Perplexity, accusing the artificial intelligence company of illegally scraping its data to train the model behind its search engine. The lawsuit, filed in a New York federal court on Wednesday, is part of a growing trend of legal disputes involving AI companies over the unauthorized use of copyrighted material.
In addition to Perplexity, Reddit has also targeted three other smaller entities: the Lithuanian data scraper Oxylabs, a group identified as AWMProxy described as a “former Russian botnet,” and Texas-based start-up SerpApi. Reddit alleges that these companies engaged in data-scraping activities that violated its copyrights by concealing their identities, locations, and disguising their web crawlers as regular users.
Ben Lee, Reddit’s chief legal officer, emphasized the fierce competition among AI firms for quality content, which he argues has fostered a large-scale “data laundering” economy. He asserts that Perplexity was “a willing customer” of at least one of its co-defendants, alleging that the San Francisco-based firm required access to scraped data to enhance its “answer engine.”
In a response to the lawsuit, Perplexity stated that it had not yet received documentation related to the case. The company reaffirmed its commitment to defending users’ rights to access public knowledge freely and responsibly. Meanwhile, both Oxylabs and SerpApi expressed their plans to contest the lawsuit, stating they had also not been served with the legal documents. Oxylabs’ chief governance and strategy officer, Denas Grybauskas, noted that Reddit had made no attempts to directly address their concerns prior to filing the lawsuit.
Sources reveal that Reddit had previously approached Perplexity regarding its alleged data misuse, suggesting a potential paid partnership, but claims that Perplexity’s founder Aravind Srinivas showed no interest. Furthermore, Reddit reportedly contacted Google to investigate whether Perplexity had been scraping its proprietary data through the search engine, seeking ways to prevent such activities.
This lawsuit adds to a significant number of copyright claims that have surfaced against AI companies since the rise of generative AI systems, which rely on massive datasets, including online content for training. Copyright holders have voiced concerns about their content being utilized without consent or appropriate compensation. Reddit went public in March 2024 and has secured multimillion-dollar partnerships with major players like Google and OpenAI, which allow them to train their large language models using Reddit’s content. In stark contrast, the social media platform accuses the defendants of bypassing its data protection measures to access its proprietary material.
This is not Reddit’s first legal contention regarding its content; in June, it filed a lawsuit against Anthropic, asserting that the AI start-up had scraped its platform over 100,000 times since July 2024. Anthropic has stated its disagreement with Reddit’s claims and has vowed to defend itself vigorously.
As the landscape of AI development continues to evolve, the legal battles surrounding data acquisition and copyrighted content are likely to intensify, echoing broader concerns about the rights and protections of original content creators in the digital age.

