Reddit blocks AI web scrapers, Google remains an exception

BY

Published 30 Jul 2024

NSFW AI Why trust Greenbot

We maintain a strict editorial policy dedicated to factual accuracy, relevance, and impartiality. Our content is written and edited by top industry professionals with first-hand experience. The content undergoes thorough review by experienced editors to guarantee and adherence to the highest standards of reporting and publishing.

Disclosure

Free Google Search Engine on Screen Stock Photo

Reddit cracks down on artificial intelligence (AI) web scrapers accessing content from its website, sparking tension with search engines in light of their recent AI licensing deal with Google.

The popular forum-based social media platform Reddit has started preventing AI web scrapers from accessing its content for training purposes without permission. The platform achieved this by modifying its Robots Exclusion Protocol “robot.txt” file to restrict access for third-party bots and implementing new policies.

Since AI has become more popular, a number of publications and well-known social media sites have emerged as primary sources of first-hand knowledge that AI systems assert to have created on their own. Reddit emphasizes that its policy changes are made to protect its users and ensure fair use of its data.

“It’s a signal to those who don’t have an agreement with us that they shouldn’t be accessing Reddit data,” says Ben Lee, Reddit’s chief legal officer. 

A $60 Million-Dollar Deal

With Reddit’s $60 million deal with Google in the background, people were quick to notice how Google remains the search engine that surfaces results from the website. Other mainstream engines revert to outdated Reddit posts, which users may find disappointing when looking for any relevant answers on the internet. 

Google’s partnership with Reddit places them at a significant advantage over the market. Some have noted it’s a business move that signals how companies must pay before using its data for AI training, putting smaller businesses at risk. Search engine competitors responded negatively, not excluding the people at Mojeek.

“They’re killing everything for search but Google,” Colin Hayhurst, CEO of Mojeek, said. He added that the Google partnership makes it harder to offer alternative ways of searching the web.

Hayhurst has tried contacting the company since June after noticing it was blocked from crawling their site. As of now, Reddit has not responded to any emails from Mojeek.

Reddit Speaks Out

The company spoke up and rejected the assumptions, stating that their recent decisions were not related to their partnership with Google.

It was further explained by Reddit’s spokesperson, Tim Rathschmidt, in an email, saying, “We have been in discussions with multiple search engines. We have been unable to reach agreements with all of them, since some are unable or unwilling to make enforceable promises regarding their use of Reddit content, including their use for AI.”

Rathschmidt added that they are open to and are still working with any company, big or small, to this day.

The company has spoken out against web scrapers and caused the severance of many ties with third-party apps due to their API changes, which charge too much for some developers to afford. Last year, Reddit CEO Steve Huffman stated these API changes were to cover costs associated with data licensing as their potential business.

Reddit’s recent policy shift highlights a growing tension between social media platforms and the AI industry. By restricting unauthorized access to its data and forging exclusive agreements, Reddit aims to safeguard its content and ensure fair compensation for its use.