Docs
Site files (robots / sitemap / llms.txt)
How Metaspry fetches and parses the three site-level files that govern crawler behavior.
The Site tab fetches three files in parallel from the active page’s host: /robots.txt, /sitemap.xml, and /llms.txt. Each has its own card.
robots.txt
Parsed into user-agent groups (allow/disallow line counts) and a sitemap-directive list. Each Sitemap: line is clickable.
The card shows Found / Missing badges. When the server returns HTML (typical SPA-fallback misconfiguration), the card flags “Server returned HTML (likely SPA fallback or missing route)” - not a generic parse error.
sitemap.xml
Fetched, parsed via DOMParser, and:
- If the root element is
<urlset>- count<loc>entries, sample the first 10. - If the root element is
<sitemapindex>- fetch up to 20 child sitemaps in parallel (each with a 4 s timeout). Recurse one level deep for nested indexes. Sum URL counts across children. The card shows total URL count plus per-child breakdown.
Fallback chain
/sitemap.xml is tried first. If the response is missing, HTML, or unparseable, Metaspry falls back through:
- URLs declared in
robots.txtSitemap:directives. /sitemap_index.xml(Yoast SEO convention)/sitemap-index.xml/wp-sitemap.xml(WordPress native)/sitemaps.xml/sitemap/sitemap.xml/sitemaps/sitemap.xml
First valid XML wins. When found at a fallback path, the card shows “Found at fallback location: /…“.
llms.txt
The emerging standard at llmstxt.org. Metaspry parses headings + bullet links (- [label](url)). Each section’s heading is shown with its link list.
When absent, the card links to the spec so curious readers can learn what they’re missing.