Your Website Is Wasting 97 Percent of Every AI Agent's Attention — Here Is How to Fix It
AI agents are visiting your site right now. Most of them are throwing away 97 percent of what they download because your content is buried in HTML navigation, scripts, and cookie banners. A practical guide to serving content that agents can actually use.
There is a question every business with a website needs to start asking: when an AI agent visits your site, what does it actually see?
The answer, for most websites, is not great. A typical 100,000 token HTML page — packed with navigation menus, JavaScript bundles, cookie consent banners, analytics scripts, and footer links — contains roughly 3,000 tokens of actual content. That is a 97 percent waste of the AI's context window. And context windows are not infinite. They are the single most expensive resource an AI model has.
Sanity's Knut Melvaer published an excellent field guide on this exact problem in February 2026, and it crystallised something I have been thinking about for a while. The question of "AI-ready content" is not one question. It is three completely different questions that most people are conflating.
Three Questions Disguised as One
When someone says "AI-ready content," they could mean any of the following:
- How do AI models represent and cite your content? This is the positioning question — can ChatGPT, Claude, or Perplexity find and reference your business when someone asks a relevant question?
- How do you serve content when AI agents show up on your site? This is the consumption question — when a bot crawls your pages, is it getting clean content or wading through HTML noise?
- Does cleaner content delivery improve your positioning in AI citations? This is the connection question — and honestly, nobody has a definitive answer yet.
Most businesses jump straight to the first question because it feels like SEO all over again. But the second question is where you actually have control right now, and it is where the biggest wins are sitting.
The Positioning Problem Is Still Unsolved
Let me be honest about what we know and do not know about AI citation optimisation.
Research firm Profound tested serving markdown versus HTML across 381 pages over three weeks. The result? No statistically significant increase in bot traffic from serving markdown. AI citation patterns showed up to 60 percent monthly volatility — meaning your visibility can swing wildly regardless of what you do.
Princeton researchers did find that strategies like statistical proof and authoritative language could boost visibility by 40 percent in controlled lab settings. But controlled lab settings are not the real world. The actual AI platforms — ChatGPT, Claude, Perplexity, Gemini — use dynamic filtering, ranking algorithms, and retrieval systems that do not map neatly to academic experiments.
Anthropic's own dynamic filtering technology improved Claude's web search accuracy by 11 percent while reducing token usage by 24 percent. The models are getting better at extracting what they need regardless of format. But that does not mean you should make their job harder.
The uncomfortable truth is that nobody has cracked AI citation optimisation yet. Anyone selling you "AEO services" with guaranteed results is guessing. Monitor the research — especially Profound's ongoing work — but do not bet your strategy on positioning tactics that might not hold up next month.
What You Can Control: Serving Content to Agents
While positioning remains uncertain, content delivery is entirely within your control. And the range of options is wider than most people realise. Here are five strategies, from zero effort to full infrastructure investment.
Strategy 1: Do Nothing
This is where most websites sit today. All major AI tools — Claude, ChatGPT, Gemini, Perplexity — internally convert HTML to markdown when they crawl a page. So your content does get through, eventually.
The problem is waste. A 100,000 token HTML page gets stripped down to roughly 3,000 tokens of actual content. That is 97 percent of the context window consumed by navigation, scripts, and chrome that the AI immediately discards. Most AI crawlers do not even execute JavaScript — they read raw HTML. Vercel's research confirmed that ChatGPT and Claude crawlers fetch but do not execute JavaScript files.
For a small brochure site, this might be fine. For a content-heavy business, you are making every AI interaction with your content significantly less efficient than it could be.
Strategy 2: Add an llms.txt File
The llms.txt proposal is simple: put a markdown file at /llms.txt that gives AI models a structured overview of your site with links to key content. Over 2,000 sites have adopted it, including Next.js, shadcn/ui, TanStack, Cloudflare, and Hugging Face.
Anthropic's own llms.txt implementation uses just 892 tokens — it is an index, not a content dump. And that is the right approach, because the alternative is ugly. Cloudflare's proposed llms-full.txt — which tries to include all content — weighs in at 46.6 megabytes. That is roughly 12 million tokens, about 60 times Claude's entire context window.
The llms.txt approach has real limitations: it is all-or-nothing with no per-page granularity, no governance layer, and no way to control what gets served to which agent. It is a good starting point, but not a long-term strategy.
Strategy 3: Edge Conversion
Cloudflare offers a dashboard toggle that converts HTML to markdown at the edge when an AI agent makes a request. No code changes required. They reported an 80 percent token reduction on their own blog after enabling it.
The trade-off is that it is a lossy conversion. A generic HTML-to-markdown parser cannot distinguish your actual content from your site chrome with perfect accuracy. It works well for simple pages, but complex layouts with interactive components or nested structures will lose information.
Strategy 4: Serve Markdown Routes with Content Negotiation
This is where things get genuinely interesting, and it is the approach I think has the highest return on investment for most content-heavy sites right now.
The concept uses HTTP content negotiation — a standard that has existed for decades. When an AI agent sends a request with Accept: text/markdown, your server responds with clean markdown instead of HTML. Same URL, different representation based on what the client asks for.
Here is what a basic implementation looks like in Next.js:
// app/md/[section]/[article]/route.ts
export async function GET(request, { params }) {
const { section, article } = await params
const doc = await fetchArticle(section, article)
const markdown = toMarkdown(doc.body)
return new Response(markdown, {
headers: {
'Content-Type': 'text/markdown; charset=utf-8',
},
})
}The results are dramatic when your content comes from a structured source. Sanity's own Learn platform measured 392 kilobytes of HTML (roughly 100,000 tokens) versus 13 kilobytes of markdown (roughly 3,300 tokens) for the same content. That is a 97 percent reduction — and unlike edge conversion, it is not lossy, because the markdown is generated from structured source data, not reverse-engineered from HTML.
| Metric | HTML | Markdown | Reduction |
|---|---|---|---|
| Page size | 392 KB | 13 KB | 97% |
| Token count | ~100,000 | ~3,300 | 97% |
| Conversion type | N/A | From structured source | Lossless |
There is another benefit here that matters for developers: Claude Code skips its internal summarisation step when it receives content with Content-Type: text/markdown. It passes the content verbatim to the model. That means your carefully structured content arrives intact, exactly as you wrote it.
You can also control granularity. Serve one lesson, a full course, a sitemap, or your complete corpus — all from the same infrastructure, depending on what the agent requests.
Strategy 5: Direct API and MCP Access
The most powerful option skips the web entirely. Instead of agents crawling your pages, they query your content directly via GROQ, GraphQL, or an MCP (Model Context Protocol) server.
This gives agents structured access to exactly the content they need, with the ability to filter, sort, and query. It is the most efficient approach, but it requires integration work and is still early-stage in terms of adoption. For most businesses, Strategy 4 is the practical sweet spot right now.
What Your Content System Needs to Support This
Here is the thing that makes this conversation relevant to every business, not just developers: as you move from "do nothing" toward structured content delivery, your CMS needs to change.
| Strategy | Content System Requirement |
|---|---|
| Do nothing | None — agents handle conversion themselves |
| llms.txt | Content must be exportable as markdown |
| Edge conversion | Cloudflare hosting (lossy conversion) |
| Markdown routes | Structured content that serialises to markdown |
| API / MCP | Structured content + query language + governance |
The pattern is clear. The further you go, the more your system needs to treat content as queryable structured data rather than page-shaped documents. A folder of markdown files can serve markdown, but it cannot answer "give me all articles tagged AI published this quarter." A headless CMS like Sanity can.
This is why we build on Sanity at Tally Digital. It is not about the CMS being trendy — it is about the content model being fundamentally compatible with where content delivery is heading. Structured content makes format fungible. The same content serves HTML to browsers, JSON to mobile apps, markdown to AI agents, and whatever format emerges next.
What Is Changing Fast Versus What Is Durable
Some of this landscape is moving extremely quickly. Specific tools like llms.txt will evolve. Standards will shift. Google's Chrome team has a WebMCP proposal that would let websites expose structured tools directly to in-browser AI agents. The tactical layer is volatile.
But the underlying investments are durable:
- Content structure — treating content as structured data rather than page-shaped documents
- Standards-based content negotiation — HTTP Accept headers have been around for decades and are not going anywhere
- Governance frameworks — controlling what content gets served to which agents under what conditions
- Format flexibility — the ability to serialise the same content into HTML, markdown, JSON, or future formats from a single source
This parallels the historical pattern of multichannel distribution. When mobile apps arrived, businesses that had structured content could serve it as JSON. When RSS was popular, structured content could generate feeds. AI agents are the next channel, and the pattern holds: structure your content once, serve it everywhere.
What I Would Actually Do Right Now
If you are reading this and wondering where to start, here is my honest recommendation:
Do This Week
- Check your server logs for AI agent visits. You might be surprised how many bots are already hitting your site. The Dark Visitors database tracks over 80 AI-specific bots — update your robots.txt accordingly.
- If you are on Cloudflare, enable the edge markdown conversion. It is a dashboard toggle and gets you an immediate 80 percent token reduction with zero code changes.
Do This Month
- Implement content negotiation for your key pages. Serve markdown to agents that request it via Accept headers. This is the highest-ROI move for most content-heavy sites.
- Audit your content model. Is your content structured as queryable data, or is it locked inside page-shaped documents? If it is the latter, you are building on a foundation that will not scale to multi-format delivery.
Skip for Now
- Do not invest heavily in llms.txt beyond a basic index — the context bloat risk is real and the standard is still evolving
- Do not chase AI citation optimisation tactics — the research does not support reliable strategies yet
- Do not panic about AI traffic — the agents are coming regardless, so focus on serving them well rather than trying to game their behaviour
The Bigger Picture
You are not optimising for AI. You are doing what good content infrastructure has always done.
That line from Sanity's field guide is the best summary I have seen. The businesses that structured their content properly ten years ago had an easier time launching mobile apps. The businesses that structure their content properly today will have an easier time serving AI agents.
The specific tools will change. The formats will evolve. But the principle is durable: structured content makes format fungible. Invest in the structure, and the delivery takes care of itself.
At Tally Digital, we build content infrastructure on Sanity specifically because it supports this approach. If your current CMS is holding you back from serving content to AI agents — or if you are not sure where your content architecture stands — book a call and we will walk through your options. This is not a problem you want to solve reactively.
Share this article