The Right Way to Do llms.txt | UpRock AI

Every major AI platform today reads your website. ChatGPT, Gemini, Perplexity, Claude. The question is no longer whether AI will read your site. It's whether your site makes it easy or hard.

This is the core idea behind Answer Engine Optimization (AEO): structuring your content so AI can find, understand, and cite it as fast as possible. SEO got you ranked. AEO gets you cited. And the infrastructure layer that makes AEO work starts with how you serve your sitemap.

The llms.txt proposal was a solid first step. Give LLMs a Markdown file at your site root that describes your content in plain text instead of XML. Great idea. But the way most sites implement it today is barely an upgrade from sitemap.xml.

Here's why.

The Problem

The Problem with Flat llms.txt

Open any popular site's llms.txt file and you'll see the same pattern: a massive flat list of every URL on the site, each with a one-line summary.

- [Fraud Prevention Solutions](https://www.sardine.ai)
- [KYC and KYB Solutions](https://www.sardine.ai/br/kyc-and-kyb)
- [B2C Credit Underwriting](https://www.sardine.ai/b2c-credit-underwriting)
... (500+ more entries)

If your agent consumes that, it burns through your token budget for zero gain. This is just a sitemap.xml wearing a Markdown costume.

The Solution

A Smarter Approach: Semantic Clustering

Instead of listing every URL in one flat file, group your content into semantic clusters and let the LLM navigate a hierarchy.

## Available Sections:
### Fraud Prevention and Risk Management (251 pages)
- Details: https://cdn.aisitemap.ai/.../llms.txt

### AML and KYC Compliance Solutions (104 pages)
- Details: https://cdn.aisitemap.ai/.../llms.txt
... (more sections)

Hierarchical Navigation

Read a tiny root file, pick the right branch, and drill down only when needed.

Semantic Grouping

Each section includes natural language summaries, page count, and update cadence.

Progressive Disclosure

Top-level index plus section details keeps irrelevant context out of the window.

Pre-Crawled CDN Delivery

Content is already crawled and structured for agents with no extra HTML parsing.

Agentic retrieval pipelines perform better with topically grouped content because it reduces noise in the context window and helps surface the right information faster.

The Data

The Benchmark: We Tested It

We ran a controlled benchmark across 7 websites and 70 questions, comparing flat llms.txt against AI Sitemaps on token consumption, latency, and retrieval reliability.

Across all 7 websites and 70 questions:

Metric	llms.txt (Flat)	AI Sitemap	Difference
Total tokens consumed	9,086,956	1,723,707	81.0% savings
Avg latency per query	18.5s	12.4s	33% faster
Errors (failed queries)	18	3	83% fewer errors
CANNOT_FIND responses	2	0	100% retrieval

Tested across SaaS, tech, AI, and developer docs sites with questions ranging from product lookups to multi-step research queries.

Token Consumption

Latency

Reliability

Bottom Line

From Traffic to Trust: What This Means for Your Website

Flat llms.txt files are just sitemaps with better formatting. They don't solve the core retrieval problem.

Semantically clustered, hierarchical AI Sitemaps cut token consumption by 81%, respond 33% faster, and fail 83% less often.

See what your AI Sitemap looks like. Generate one free at UpRock.ai.

Your site already has an audience of AI agents. A flat llms.txt wastes their context window. Structure it right and they find what they need faster. That's how you get cited.

AI Sitemaps are powered by UpRock DePIN infrastructure and generated at UpRock.ai.

Model: Gemini 3 Flash Preview · 7 websites · 70 questions · March 2026

The Right Way to Do llms.txt:Why AI-Native Sitemaps Beat Flat URL Lists