Skip to content
Metaspry

Anthropic's 3-bot identity problem: blocking ClaudeBot doesn't block Claude

Anthropic now operates ClaudeBot, Claude-User, and Claude-SearchBot as three separate robots.txt identities. Sites that only blocked ClaudeBot still leak content via the other two. Here's the full pattern - and the equivalent OpenAI two-bot trap.

Paul Lukic 1 min read

If you set up your robots.txt to block ClaudeBot and assumed Anthropic could no longer reach your site, you blocked one third of the surface. The other two thirds are still pulling on every prompt and every search query.

The three Anthropic bots

User-agentPurposeTriggered when
ClaudeBotTraining crawlerAnthropic background-crawls the web for future training data
Claude-UserUser-triggered fetchA user pastes your URL into Claude or asks about a page
Claude-SearchBotSearch indexClaude’s real-time search feature crawls for live answers

All three are documented in Anthropic’s official crawler docs. All three honor robots.txt directives. All three need to be blocked separately if you want full opt-out.

The three exist because they serve fundamentally different needs and have different traffic shapes. ClaudeBot runs in the background; Claude-User fires only when a user explicitly references your page; Claude-SearchBot powers Claude’s search-grounded answers in near real time. From an opt-out perspective they’re distinct, even though the brand is the same.

The full block

# Anthropic - all three identities
User-agent: ClaudeBot
Disallow: /

User-agent: Claude-User
Disallow: /

User-agent: Claude-SearchBot
Disallow: /

A single User-agent: anthropic-ai block (which some older guides recommend) is not sufficient. Anthropic does not honor an organization-level umbrella user-agent; only the specific identities work.

The equivalent OpenAI trap

Same pattern, slightly different cast:

User-agentPurpose
GPTBotTraining crawler
ChatGPT-UserUser-triggered fetch

If you blocked GPTBot and stopped there, ChatGPT can still fetch your page in response to a user query (ChatGPT-User, with Browse / web search enabled). The same trap, smaller surface — only two bots, but easy to miss.

Full block:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

The other vendors (one bot each, mostly)

For comparison, here’s what most other AI vendors look like:

  • Google: Googlebot (search + AI Overviews) and Google-Extended (Gemini training opt-out). See the Google-Extended myth — they’re not what most people think.
  • Perplexity: PerplexityBot. Single identity. (Note: Cloudflare data shows ~13% of AI bot traffic ignores robots.txt entirely; PerplexityBot has been reported as sometimes ignoring directives in the field, even though the docs say it honors them.)
  • Apple: Applebot-Extended for Apple Intelligence training opt-out.
  • Common Crawl (feeds training of many open-weight models): CCBot.
  • ByteDance / TikTok: Bytespider. Often ignored.
  • Meta: FacebookBot.

So Anthropic and OpenAI are the outliers with multi-bot identities. Most other vendors are single-identity.

Where Cloudflare comes in

If you have a publisher-grade business reason to block AI training (paywalled content, premium reporting, structured data your competitors would mass-scrape), robots.txt is not enough.

Cloudflare’s Q1 2026 report tracked AI crawler traffic: 13%+ of AI bots ignore robots.txt directives. The long tail of fly-by-night training crawlers is especially bad. To enforce the policy in the real world, you need server-level blocks:

  • Cloudflare’s free “Block AI bots” rule (toggles on a curated list at the edge).
  • Per-vendor IP/user-agent blocks at your origin or CDN.
  • Paid services like DarkVisitors or TollBit that maintain the blocklist + log violations.

Robots.txt is the polite request. Cloudflare / origin rules are the enforcement.

What this changes for SEO content

If you’re writing about AI crawler policy in 2026, the responsible default is “list all three Anthropic bots, list both OpenAI bots, and mention that robots.txt alone is insufficient for enforcement.” Most articles published as recently as last quarter only mention ClaudeBot and GPTBot.

The detection cost is asymmetric: a partial block looks fine in any robots.txt validator. The actual surface — what bots are still reaching your pages — only shows up in server log analysis or in tools that explicitly cross-check each vendor’s documented identities. Metaspry’s AI crawler row lists all eleven bots and surfaces partial-vendor blocks as a warning.

Quick audit checklist

  1. Open your robots.txt.
  2. For each AI vendor whose access matters to you, list every identity. (Anthropic = 3, OpenAI = 2, Google = 2, rest = 1 each.)
  3. Decide your stance per vendor: full-allow, training-block-search-allow, or full-block.
  4. Apply uniformly across vendors. (Blocking Anthropic but not OpenAI is allowed but should be intentional.)
  5. If your stance is full-block and the content has commercial value, add server-level enforcement on top of robots.txt.

Further reading

Share: XLinkedInHN

Related posts