Docs
Indexability conflicts (noindex + canonical + sitemap + X-Robots-Tag)
Four crawler-facing signals, contradictory directives, and the silent traffic drops they cause. Metaspry's conflict detector explained.
Most “page disappeared from Google” cases come from contradictions across four signals that nobody inspects together: <meta robots>, X-Robots-Tag, rel="canonical", and sitemap inclusion. The audit’s Indexability check is built around that pile-up.
The four signals
| Signal | Set in | What it tells Google |
|---|---|---|
<meta name="robots"> | HTML <head> | Per-page index/follow rules |
X-Robots-Tag HTTP header | Server response | Same as meta robots, but for non-HTML resources (PDFs, images) — wins over HTML on conflict |
<link rel="canonical"> | HTML <head> | ”Treat this other URL as the version of record” |
| Sitemap inclusion | sitemap.xml | ”This URL is important; please crawl it” |
When they agree, indexing works. When they disagree, Google guesses — and usually guesses wrong.
The four most-frequent conflicts
1. noindex + sitemap inclusion
The page is in your sitemap (so Google should crawl) but <meta name="robots" content="noindex"> (so Google can’t index). Result: wasted crawl budget, slow deindex, contradictory signal that makes Google distrust the whole sitemap.
Fix: Pick one. If the page should be hidden, remove from sitemap and serve noindex. If it should be indexed, drop the noindex.
2. Canonical pointing to a noindex page
Page A canonicals to Page B. Page B has noindex. Net result: nothing indexed. Google honors the canonical, then honors the noindex.
Fix: Either remove noindex from the canonical target, or change the canonical to point to a real indexable version.
3. HTML says index, X-Robots-Tag says noindex
Your <head> has <meta name="robots" content="index, follow">. Your server returns X-Robots-Tag: noindex in the response headers. Most restrictive wins — and devs only ever inspect HTML.
Fix: Audit your server config and your edge layer. The header is invisible in browser dev tools’ Elements panel — only the Network tab shows it.
4. Self-redirecting canonical chain
<link rel="canonical" href="https://example.com/page"> where /page returns a 301 to /page/. Canonical → redirect → final. Google may pick the wrong canonical or ignore the hint entirely.
Fix: Canonical always points to the final 200-status URL.
What Metaspry checks per page
When you run the audit, the Indexability rule combines all four signals into a single verdict:
- Indexable — all signals agree, page is eligible.
- Conflict (mixed) — one or more signals disagree. The card shows which.
- Excluded — page is intentionally blocked (noindex, canonical to off-site, or all four agree on hidden).
Each conflict is named, not just flagged:
- “Canonical → noindex page”
- “Self-redirecting canonical (301 chain)”
- “X-Robots-Tag header overrides meta robots”
- “noindex + in sitemap”
- “Cross-domain canonical (likely staging leak)”
- “Canonical missing (auto-canonical drift risk)“
Staging leaks to production
A specific case worth its own callout because it has caused multiple documented deindexings: cross-domain canonical.
After a migration, the production site’s <head> ends up with <link rel="canonical" href="https://staging.example.com/page">. Google believes you. The whole production site deindexes.
The audit flags any canonical whose hostname differs from the page’s hostname. If the canonical points to staging. / dev. / localhost / a different TLD, it’s surfaced as a critical alert.
robots.txt vs noindex
Worth restating because it confuses people: robots.txt blocking does not noindex.
Disallow:in robots.txt = don’t crawl. But Google may still index the URL based on inbound links, showing a snippet-less title-only result.<meta robots noindex>orX-Robots-Tag: noindex= don’t index. Crawler still needs to fetch the page to see the directive.
If you want a page fully out of the index, you must allow it in robots.txt and serve a noindex directive. Blocking in robots.txt prevents Google from ever seeing the noindex.
Related
- AI crawler signals - GPTBot, ClaudeBot, Google-Extended
- Audit rules - full rule list
- Site files - robots.txt + sitemap.xml + llms.txt parsing