llms.txt in 2026: a cargo cult most tools are afraid to call out
Google does not use llms.txt. The 300k-domain study found zero citation lift. Here is who actually reads it, what it should contain, and why every SEO tool that shows MISSING llms.txt as a red error is participating in misinformation.
The proposal at llmstxt.org describes a tidy markdown index of your site’s pages, intended for LLM consumption. Adoption looks like it’s everywhere — Yoast and Rank Math ship generators, every SEO tool now flags it as a missing best practice, and Twitter has produced thousands of breathless threads about LLM SEO.
Here’s the actual state in May 2026.
What the data says
Three independent data points, all from primary sources:
- Gary Illyes (Google) confirmed Google does not use llms.txt. Said directly in the late-2025 Search Off the Record podcast; reiterated on social.
- SE Ranking studied 300,000 domains. Of those that adopted llms.txt, zero measurable lift in AI citations vs domains that didn’t.
- OtterlyAI ran a 90-day fetch experiment. Across all AI crawlers hitting their monitored domains, ~0.1% of requests touched
/llms.txt.
That’s the empirical baseline. Now the nuance.
Who actually reads it
- Anthropic has confirmed in docs that ClaudeBot honors llms.txt where present.
- Perplexity has stated similar.
- OpenAI has not documented their stance.
- Google explicitly does not use it.
- Bing has not commented.
So if you exist primarily for Claude-based answer engines or Perplexity surfaces, ship one. If you exist primarily for Google AI Overviews — which by every visibility metric is still the largest AI-search surface — shipping llms.txt does nothing for that traffic.
Why every SEO tool flags it as missing
Because “you’re missing X” is a much better marketing surface than “this signal does not affect your traffic.” Tools that drive subscription revenue need to keep finding things to flag. llms.txt has the perfect properties for that:
- It’s new (so most sites don’t have it).
- The spec is short (so it’s easy to validate).
- It sounds important.
- The downside of adding it is near-zero, so the “fix” is cheap.
This is how a cargo cult forms. Not because anyone’s lying — just because the incentives align.
What goes in a good llms.txt
If you do ship one — and there’s no reason not to, given the cost is one markdown file — here’s what actually helps the consumers that read it:
# Your Site Name
> One-sentence description of what your site is and who it's for.
## Docs
- [Page title](https://example.com/page-1): One-line description.
- [Page title](https://example.com/page-2): One-line description.
## Blog highlights
- [Post title](https://example.com/blog/post-1)
## Optional
- [GitHub](https://github.com/you)
- [RSS feed](https://example.com/rss.xml)
The format is borrowed from the lower-case markdown standard at llmstxt.org. The most useful thing you can put there is the one-line description per page — that’s what gives a crawler enough context to decide whether to fetch the full content.
Don’t auto-generate a 50,000-line file from your sitemap. Curate. The point of llms.txt over sitemap.xml is human-edited prioritization.
The Anthropic 3-bot tangent
While we’re on AI crawlers — if you’ve blocked ClaudeBot in your robots.txt to keep your content out of Claude training, you’ve only blocked one third of Anthropic. The other two:
Claude-User— fetches the page when a user asks Claude about it.Claude-SearchBot— real-time search index for Claude.
Both still pull your page on every prompt. To fully opt out:
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-User
Disallow: /
User-agent: Claude-SearchBot
Disallow: /
Same pattern exists for OpenAI (GPTBot is training, ChatGPT-User is fetch-on-prompt).
The Google-Extended myth (worth repeating)
Almost every SEO post that mentions Google-Extended gets this wrong. Setting User-agent: Google-Extended + Disallow: / in robots.txt opts your content out of Gemini training data. It does not opt you out of AI Overviews — those are powered by the regular Googlebot index.
The actual lever for AI Overviews is <meta name="robots" content="nosnippet"> — which also removes your regular SERP snippet. There’s no signal that opts out of AI Overviews while keeping snippets.
Most sites that set Google-Extended thinking they were opting out of AI Overviews are still being cited. The opt-out doesn’t exist the way they want it to.
What to actually do this week
- Run an audit on your home page (Metaspry, GSC URL inspection, your tool of choice).
- Check the AI crawler row for partial-vendor blocks (the Anthropic 3-bot trap).
- Decide your stance: full-allow, allow-search-block-training, or full-block. Update robots.txt for all bot identities per vendor.
- Ship a curated
/llms.txtif and only if Claude / Perplexity traffic matters to you. Skip it otherwise. - Stop using “missing llms.txt” as a red error in your audit reports. It’s misleading by default.
Further reading
- Metaspry docs: AI crawler signals — explainer with paste-ready robots.txt blocks
- Metaspry docs: site files — how the extension parses llms.txt
- llmstxt.org — the original proposal
- SE Ranking, “llms.txt shows no clear effect on AI citations” (300k-domain study, late 2025)
- OtterlyAI, “The llms.txt experiment” — 90-day fetch data
Related posts
The Google-Extended myth: it doesn't opt you out of AI Overviews
Almost every SEO post gets this wrong. Google-Extended blocks Gemini training data, not AI Overview citation. The actual AI Overview opt-out is nosnippet - and it kills your regular SERP snippet too. Here's the breakdown.
Google I/O 2026: every page is now an API endpoint for agents
Information Agents, Universal Commerce Protocol, Generative UI, audio glasses, and a model called Omni. The honest framing isn't 'Google killed the web' - it's 'your meta tags and JSON-LD are now the API agents consume'. With primary-source citations + what to ship this week.
Anthropic's 3-bot identity problem: blocking ClaudeBot doesn't block Claude
Anthropic now operates ClaudeBot, Claude-User, and Claude-SearchBot as three separate robots.txt identities. Sites that only blocked ClaudeBot still leak content via the other two. Here's the full pattern - and the equivalent OpenAI two-bot trap.