Reddit Takes Legal Action Against Perplexity AI and Data Scrapers Over Unauthorized Data Use

1 min read     Updated on 23 Oct 2025, 07:43 AM
scanx
Reviewed by
Shriram ShekharScanX News Team
Overview

Reddit has filed a lawsuit against Perplexity AI and three data scraping companies - Oxylabs UAB, AWMProxy, and SerpApi - for allegedly collecting and reselling Reddit's data without authorization. The social media platform accuses the scraping companies of gathering data through Google search results for resale, with Perplexity AI allegedly purchasing this data. Reddit, which already has licensing agreements with OpenAI and Google, views this as part of an 'industrial-scale data laundering economy' driven by AI companies' need for quality human content. This legal action follows a similar lawsuit against Anthropic and highlights the ongoing debate about data ownership and usage rights in AI development.

22731202

*this image is generated using AI for illustrative purposes only.

Reddit, the popular social media platform, has initiated legal proceedings against Perplexity AI and three data scraping companies, alleging unauthorized collection and resale of its data. This move highlights the growing tensions between content platforms and AI companies in the evolving digital landscape.

The Lawsuit

Reddit has filed a lawsuit against four companies:

Company Role Allegations
Perplexity AI AI company Purchasing scraped data
Oxylabs UAB Data scraping company Unauthorized collection and resale of Reddit data
AWMProxy Data scraping company Unauthorized collection and resale of Reddit data
SerpApi Data scraping company Unauthorized collection and resale of Reddit data

According to the complaint, the three scraping companies have been collecting Reddit data through Google search results with the intention of reselling it. Perplexity AI is alleged to have purchased data from at least one of these companies.

Reddit's Stance

Reddit's chief legal officer has described the situation as an "industrial-scale data laundering economy." The company argues that this practice is driven by AI companies competing for quality human content, highlighting Reddit's value as one of the largest collections of human conversation online.

Existing Agreements and Previous Actions

It's worth noting that Reddit already has licensing agreements in place with some major tech companies:

Company Agreement Type
OpenAI Existing licensing agreement
Google Existing licensing agreement

This lawsuit is not Reddit's first legal action in this domain. The company previously sued AI firm Anthropic over similar data scraping allegations, demonstrating a consistent approach to protecting its content.

Implications

This legal action underscores the ongoing debate about data ownership, usage rights, and the ethical considerations surrounding AI training data. As AI technologies continue to advance, the resolution of this case could have significant implications for the future of content licensing and data scraping practices in the tech industry.

Reddit's proactive stance in pursuing legal action against companies using its content without formal agreements reflects the growing importance of data as a valuable asset in the digital age. The outcome of this lawsuit may set important precedents for how online platforms protect their data and how AI companies source their training material.

like20
dislike

Reddit Seeks Dynamic Pricing in AI Content Deals with Google and OpenAI

1 min read     Updated on 17 Sept 2025, 10:06 PM
scanx
Reviewed by
Shraddha JoshiScanX News Team
Overview

Reddit is initiating early discussions with Google and OpenAI to revise content-sharing agreements, aiming for dynamic pricing structures that increase compensation as its data becomes more valuable to AI platforms. Current agreements generate $203 million over 2-3 years, but Reddit believes this undervalues its data. The platform is pursuing deeper integration with Google's AI products and aims to convert Google traffic into active Reddit users. Reddit's content is highly cited in AI platforms, providing valuable training data for large language models powering tools like ChatGPT and Google's AI assistants.

19672601

*this image is generated using AI for illustrative purposes only.

Reddit, the popular online discussion platform, is making strategic moves in the artificial intelligence (AI) landscape by initiating early discussions with tech giants Google and OpenAI to reshape their content-sharing agreements. The company aims to move beyond traditional licensing models and establish dynamic pricing structures that would increase compensation as Reddit's data becomes more valuable to AI platforms.

Current Agreements and Future Aspirations

Reddit's existing agreements with Google and OpenAI are part of deals that generate $203.00 million in contract value over a two to three-year period. However, the platform believes that these transactional licensing terms do not adequately reflect the true value of their data to AI platforms.

Seeking Deeper Integration and User Conversion

As part of its negotiations, Reddit is pursuing deeper integration with Google's AI products. The platform's strategy includes converting Google traffic into active Reddit users who would contribute content for future AI training. This approach aims to create a symbiotic relationship between Reddit's user-generated content and AI development.

Reddit's Unique Value Proposition

Reddit's content remains among the most cited sources across AI platforms, with its discussion format providing valuable training data for large language models. These models power popular AI tools such as:

  • ChatGPT (OpenAI)
  • Google's AI Overviews
  • Gemini assistant (Google)

The platform's diverse and extensive user-generated content offers a rich source of natural language data, making it particularly valuable for training and improving AI models.

Dynamic Pricing and Future Value

By pushing for dynamic pricing structures, Reddit aims to ensure that its compensation grows in tandem with the increasing value of its data to AI platforms. This forward-thinking approach reflects Reddit's understanding of the evolving AI landscape and its desire to capitalize on the long-term value of its user-generated content.

Conclusion

As discussions are still in the early stages, the outcome of these negotiations could set a precedent for how online platforms monetize their data in the age of AI. Reddit's move highlights the growing recognition of user-generated content as a valuable asset in the development of advanced AI technologies.

The tech industry will be watching closely as these discussions unfold, potentially reshaping the relationship between content platforms and AI developers in the coming years.

like19
dislike
Explore Other Articles