Skip to content
Metaspry

Docs

Site files (robots / sitemap / llms.txt)

How Metaspry fetches and parses the three site-level files that govern crawler behavior.

The Site tab fetches three files in parallel from the active page’s host: /robots.txt, /sitemap.xml, and /llms.txt. Each has its own card.

robots.txt

Parsed into user-agent groups (allow/disallow line counts) and a sitemap-directive list. Each Sitemap: line is clickable.

The card shows Found / Missing badges. When the server returns HTML (typical SPA-fallback misconfiguration), the card flags “Server returned HTML (likely SPA fallback or missing route)” - not a generic parse error.

sitemap.xml

Fetched, parsed via DOMParser, and:

  • If the root element is <urlset> - count <loc> entries, sample the first 10.
  • If the root element is <sitemapindex> - fetch up to 20 child sitemaps in parallel (each with a 4 s timeout). Recurse one level deep for nested indexes. Sum URL counts across children. The card shows total URL count plus per-child breakdown.

Fallback chain

/sitemap.xml is tried first. If the response is missing, HTML, or unparseable, Metaspry falls back through:

  1. URLs declared in robots.txt Sitemap: directives.
  2. /sitemap_index.xml (Yoast SEO convention)
  3. /sitemap-index.xml
  4. /wp-sitemap.xml (WordPress native)
  5. /sitemaps.xml
  6. /sitemap/sitemap.xml
  7. /sitemaps/sitemap.xml

First valid XML wins. When found at a fallback path, the card shows “Found at fallback location: /…“.

llms.txt

The emerging standard at llmstxt.org. Metaspry parses headings + bullet links (- [label](url)). Each section’s heading is shown with its link list.

When absent, the card links to the spec so curious readers can learn what they’re missing.