Cloudflare has publicly accused Perplexity AI of using stealthy, undeclared web crawlers to bypass standard bot restrictions set by websites. In a detailed blog post, Cloudflare alleged that Perplexity is not only ignoring robots.txt directives but also using alternate IP ranges and cloaked user agents to mask the activity of its web-scraping infrastructure.
The focus of the accusation is that Perplexity is circumventing a common web standard used to prevent unwanted indexing or scraping. This happens when a crawler accesses a site without identifying itself properly, or when it actively avoids detection by misrepresenting itself through misleading user-agent strings or by coming from infrastructure not associated with the company’s known bot network. According to Cloudflare, this is precisely what Perplexity has been doing.
Cloudflare noted that the IPs involved in the activity did not match Perplexity’s declared crawler information. It said that Perplexity’s public crawler, named PerplexityBot, does respect opt-out rules. However, the traffic in question came from entirely different infrastructure, with generic or empty user agents, and continued to request data even when websites had explicitly disallowed crawlers. Cloudflare claims that when they blocked these bots, the traffic would switch to another network to try again, pointing to deliberate evasion.
Perplexity responded to the claims saying it only accesses public webpages and attributed the crawling activity to a third-party provider. The company did not directly deny using that data in its products. However, Cloudflare argued that this response sidesteps the core issue: the traffic was still hitting sites without following clearly posted restrictions, and it was traced back to Perplexity’s backend operations.
The broader concern from Cloudflare is that some AI companies are increasingly ignoring web standards, while building commercial products on top of scraped content. The post emphasized that millions of websites using Cloudflare’s services have set up rules to block specific crawlers or all automated bots, and that those rules must be respected by anyone acting in good faith. Cloudflare also said it is working to strengthen its bot mitigation tools and has begun blocking such evasion techniques more aggressively.
The tension comes amid growing scrutiny over how AI companies acquire their training data. As competition in AI intensifies, more companies are being caught pushing the boundaries of ethical data collection. This case involving Perplexity adds to a broader debate about transparency, permission, and how AI tools should be trained.
