Anthropic's 3-bot identity problem: blocking ClaudeBot doesn't block Claude
Anthropic now operates ClaudeBot, Claude-User, and Claude-SearchBot as three separate robots.txt identities. Sites that only blocked ClaudeBot still leak content via the other two. Here's the full pattern - and the equivalent OpenAI two-bot trap.
If you set up your robots.txt to block ClaudeBot and assumed Anthropic could no longer reach your site, you blocked one third of the surface. The other two thirds are still pulling on every prompt and every search query.
The three Anthropic bots
| User-agent | Purpose | Triggered when |
|---|---|---|
ClaudeBot | Training crawler | Anthropic background-crawls the web for future training data |
Claude-User | User-triggered fetch | A user pastes your URL into Claude or asks about a page |
Claude-SearchBot | Search index | Claude’s real-time search feature crawls for live answers |
All three are documented in Anthropic’s official crawler docs. All three honor robots.txt directives. All three need to be blocked separately if you want full opt-out.
The three exist because they serve fundamentally different needs and have different traffic shapes. ClaudeBot runs in the background; Claude-User fires only when a user explicitly references your page; Claude-SearchBot powers Claude’s search-grounded answers in near real time. From an opt-out perspective they’re distinct, even though the brand is the same.
The full block
# Anthropic - all three identities
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-User
Disallow: /
User-agent: Claude-SearchBot
Disallow: /
A single User-agent: anthropic-ai block (which some older guides recommend) is not sufficient. Anthropic does not honor an organization-level umbrella user-agent; only the specific identities work.
The equivalent OpenAI trap
Same pattern, slightly different cast:
| User-agent | Purpose |
|---|---|
GPTBot | Training crawler |
ChatGPT-User | User-triggered fetch |
If you blocked GPTBot and stopped there, ChatGPT can still fetch your page in response to a user query (ChatGPT-User, with Browse / web search enabled). The same trap, smaller surface — only two bots, but easy to miss.
Full block:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
The other vendors (one bot each, mostly)
For comparison, here’s what most other AI vendors look like:
- Google:
Googlebot(search + AI Overviews) andGoogle-Extended(Gemini training opt-out). See the Google-Extended myth — they’re not what most people think. - Perplexity:
PerplexityBot. Single identity. (Note: Cloudflare data shows ~13% of AI bot traffic ignores robots.txt entirely; PerplexityBot has been reported as sometimes ignoring directives in the field, even though the docs say it honors them.) - Apple:
Applebot-Extendedfor Apple Intelligence training opt-out. - Common Crawl (feeds training of many open-weight models):
CCBot. - ByteDance / TikTok:
Bytespider. Often ignored. - Meta:
FacebookBot.
So Anthropic and OpenAI are the outliers with multi-bot identities. Most other vendors are single-identity.
Where Cloudflare comes in
If you have a publisher-grade business reason to block AI training (paywalled content, premium reporting, structured data your competitors would mass-scrape), robots.txt is not enough.
Cloudflare’s Q1 2026 report tracked AI crawler traffic: 13%+ of AI bots ignore robots.txt directives. The long tail of fly-by-night training crawlers is especially bad. To enforce the policy in the real world, you need server-level blocks:
- Cloudflare’s free “Block AI bots” rule (toggles on a curated list at the edge).
- Per-vendor IP/user-agent blocks at your origin or CDN.
- Paid services like DarkVisitors or TollBit that maintain the blocklist + log violations.
Robots.txt is the polite request. Cloudflare / origin rules are the enforcement.
What this changes for SEO content
If you’re writing about AI crawler policy in 2026, the responsible default is “list all three Anthropic bots, list both OpenAI bots, and mention that robots.txt alone is insufficient for enforcement.” Most articles published as recently as last quarter only mention ClaudeBot and GPTBot.
The detection cost is asymmetric: a partial block looks fine in any robots.txt validator. The actual surface — what bots are still reaching your pages — only shows up in server log analysis or in tools that explicitly cross-check each vendor’s documented identities. Metaspry’s AI crawler row lists all eleven bots and surfaces partial-vendor blocks as a warning.
Quick audit checklist
- Open your robots.txt.
- For each AI vendor whose access matters to you, list every identity. (Anthropic = 3, OpenAI = 2, Google = 2, rest = 1 each.)
- Decide your stance per vendor: full-allow, training-block-search-allow, or full-block.
- Apply uniformly across vendors. (Blocking Anthropic but not OpenAI is allowed but should be intentional.)
- If your stance is full-block and the content has commercial value, add server-level enforcement on top of robots.txt.
Further reading
- Metaspry docs: AI crawler signals — full reference with paste-ready blocks
- The Google-Extended myth — the equivalent Google misconception
- llms.txt in 2026 — what else AI crawlers may (or may not) read
Related posts
llms.txt in 2026: a cargo cult most tools are afraid to call out
Google does not use llms.txt. The 300k-domain study found zero citation lift. Here is who actually reads it, what it should contain, and why every SEO tool that shows MISSING llms.txt as a red error is participating in misinformation.
The Google-Extended myth: it doesn't opt you out of AI Overviews
Almost every SEO post gets this wrong. Google-Extended blocks Gemini training data, not AI Overview citation. The actual AI Overview opt-out is nosnippet - and it kills your regular SERP snippet too. Here's the breakdown.
Google I/O 2026: every page is now an API endpoint for agents
Information Agents, Universal Commerce Protocol, Generative UI, audio glasses, and a model called Omni. The honest framing isn't 'Google killed the web' - it's 'your meta tags and JSON-LD are now the API agents consume'. With primary-source citations + what to ship this week.