CALL
  • News
  • Solutions
  • Cloudflare launches an AI-based tool to combat bots

Cloudflare launches an AI-based tool to combat bots

Cloudflare, a leading public cloud services provider, has introduced a new free tool aimed at preventing data extraction from websites hosted on its platform by bots, which then use this data to train artificial intelligence models.

Some companies, such as Google, OpenAI, and Apple, offer website owners the ability to block their bots used for data gathering and AI training by making changes to the robots.txt file. This text file informs bots which pages on the site are available for scanning. However, Cloudflare notes that not all AI scrapers adhere to these rules.

"Customers do not want AI bots visiting their sites, especially those acting unfairly," Cloudflare states in its official blog. "We are concerned that some companies intending to circumvent access rules to content will continuously adapt to avoid detection."

To address this issue, Cloudflare has analyzed traffic from AI bots and crawlers to tune automatic detection models. These models assess whether an AI bot might attempt to evade detection by mimicking human behavior using a web browser.

"When malicious actors attempt widespread website scanning, they typically use tools and frameworks that we can identify," explains Cloudflare. "Based on these signals, our models can flag traffic from evasive AI bots as unwanted."

Cloudflare has also created a form through which web hosts can report suspicious AI bots and scanners. The company pledges to manually blacklist such bots as they are detected.

The issue of AI bots has become particularly relevant as the boom in generative AI has intensified the demand for data to train models. Many websites, fearing their content will be used without permission or compensation, have opted to block AI scrapers and crawlers. According to one study, about 26% of the top 1000 websites have blocked OpenAI's bot. Another study found that over 600 news publishers have blocked scanners.

However, blocking alone isn't a foolproof defense. Some AI providers appear to ignore standard bot exclusion rules to gain a competitive edge. For instance, the AI search engine Perplexity has been accused of masquerading as legitimate users to copy content from websites, while OpenAI and Anthropic occasionally disregard robots.txt rules.

In a letter to publishers, content licensing startup TollBit noted numerous AI agents ignoring standard robots.txt rules.

Tools like those offered by Cloudflare can help if they are accurate in detecting hidden AI bots. However, they do not solve the broader issue where publishers may sacrifice referral traffic from AI tools like Google's reviews that exclude sites blocking specific AI crawlers.

Thus, while Cloudflare's new tools may be a crucial step in combating unfair AI bots, a comprehensive solution to the problem requires a broader approach and cooperation among web hosts, content providers, and AI developers.
Cloudflare Launches AI Tool to Protect Websites from Scrapers

Author: Anna
 

LEAVE A REQUEST FOR FREE