llms.txt vs robots.txt, Explained | The Darkroom

Q: What is the difference between llms.txt and robots.txt?

robots.txt is a permission file: it tells crawlers and bots which paths they may or may not request, and it has been a recognized web standard for decades. llms.txt is a content file: a markdown document that points AI models to your most important pages and explains what your site is about, in plain language a model can read. robots.txt controls access; llms.txt offers a guide. They solve different problems, so you generally want both rather than choosing one.

The short answer

robots.txt and llms.txt are not competitors. robots.txt is a permission file that tells crawlers which paths they may request, and it has been a web standard for decades. llms.txt is a content file, a markdown document that points AI models to your most important pages and describes what your site is about in plain language. robots.txt controls access; llms.txt offers a curated guide. Use robots.txt to allow or block specific AI bots, and add llms.txt as an optional, low-cost map of your best content. Keep both.

What does robots.txt actually do?

robots.txt is a plain-text file at the root of your domain (at /robots.txt) that tells automated crawlers which parts of your site they may and may not request. It is the oldest and most widely respected piece of the crawl-control toolkit, dating back to the mid-1990s, and every serious crawler looks for it before it starts fetching pages.

The syntax is simple. You declare a user-agent (the name of a specific bot, or * for all of them), then list Disallow and Allow rules for paths. For AI specifically, this is where you control the named AI crawlers: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and Google-Extended (Google's AI training token). If you want a model's crawler to stay out of your checkout flow or your members area, robots.txt is the lever.

One thing robots.txt does not do: it does not describe your content. It is a bouncer with a list of doors, not a tour guide. It says "you may enter here, not there," and nothing about what is worth reading once a crawler is inside. That gap is exactly what llms.txt was proposed to fill. For the deeper story on why bots fail to reach the right pages at all, read why AI crawlers can't see your website.

What does llms.txt actually do?

llms.txt is a markdown file at /llms.txt that gives AI models a curated, human-readable map of your site. Instead of access rules, it contains a short description of what your site is, followed by linked lists of your most important pages with one-line summaries. The idea, proposed in 2024, is to hand a model a clean index so it does not have to guess which of your hundreds of URLs actually matter.

Think of it as the README a model reads to understand you. A good llms.txt opens with an H1 of your brand name, a blockquote summary of what you do, then sections like Docs, Guides, and Products, each a bullet list of links with terse descriptions. Some sites also publish an llms-full.txt that inlines the full text of key pages so a model can read the content without crawling at all.

llms.txt does not allow or block anything. It cannot restrict a crawler, and it does not override robots.txt. It is purely additive context: a way to say "here is what we are, and here are the pages that explain it best." If you want the step-by-step on writing one, our llms.txt setup walkthrough covers the exact format.

So how are llms.txt and robots.txt actually different?

The cleanest way to hold it: robots.txt controls access, llms.txt offers a guide. One is about permission, the other about curation. They sit at the same place (your domain root) and both speak to crawlers, which is why people confuse them, but they answer different questions.

Purpose. robots.txt says where crawlers may go. llms.txt says what is worth reading and what your site is about.
Format. robots.txt uses a strict directive syntax (User-agent, Allow, Disallow). llms.txt is markdown a human can read top to bottom.
Enforcement. robots.txt is the recognized standard that major crawlers honor. llms.txt is a newer convention with uneven adoption.
Effect. robots.txt can stop a fetch. llms.txt can only help a model find and understand your best pages faster.

Because the jobs are different, the question is rarely "which one." It is "are both doing their job." For the standalone case on the guide file, see what is llms.txt and do you need it.

Do AI crawlers actually obey these files?

robots.txt compliance is voluntary, but in practice the major AI companies publish their crawler user-agent names precisely so you can control them, and they honor robots.txt rules for those agents. That makes it the reliable place to allow or disallow GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and the rest. If a bot ignores robots.txt, that is a trust problem with that operator, not a flaw in the file.

llms.txt is the newer and shakier one. Adoption is uneven across models, and no engine has committed to it as a ranking input. Treat it as a helpful, cheap convenience rather than a guarantee, the same way you would not bet your whole strategy on a single meta tag. It costs almost nothing to publish and can only help a curious crawler find your good pages faster.

One honest caveat about both: neither file changes what a model already learned during training. If ChatGPT or Claude already absorbed your public pages months ago, these files shape today's crawling and tomorrow's freshness, not yesterday's training run. They are forward-looking levers, not retroactive ones.

How should you use both files together?

The answer-first version: keep robots.txt for control and add llms.txt for guidance. Here is the order we use across the 10+ brands we run a single visibility engine for.

Audit robots.txt first. Confirm you are not accidentally blocking the AI crawlers you want to reach you. A blanket Disallow that you forgot about is the single most common reason a brand is invisible to AI.
Decide your bot policy deliberately. Allow the AI crawlers whose answers you want to appear in. Block the paths that should never be summarized, like account areas or thank-you pages.
Publish a tight llms.txt. Brand name, one-line summary, and links to your 10 to 20 pages that best explain who you are and what you sell. Keep it curated, not a full sitemap dump.
Make those linked pages extractable. A guide file only helps if the pages it points to answer questions cleanly.

That last point matters more than either file. A perfect llms.txt pointing at vague pages still loses. The work that compounds is making the destination pages worth citing, which is exactly what structuring content for AI extraction is about.

What we will not promise about either file

Here is our credibility line: publishing llms.txt will not make a model cite you, and editing robots.txt will not undo training that already happened. Anyone selling "llms.txt that guarantees AI citations" is selling a tag, not an outcome. These files are plumbing. They make sure the right crawlers can reach the right pages and find them quickly, which is necessary but not sufficient.

What actually drives citations is the boring compound work: consistent facts across the web, content a model can lift cleanly, and trustworthy third-party mentions. The two files remove friction; the content earns the win. If you want a baseline on whether AI can even reach and understand your site today, our AI visibility audit checks both files plus the pages behind them, and tells you which crawlers you are accidentally turning away.

Questions people ask

What is the difference between llms.txt and robots.txt?

robots.txt is a permission file: it tells crawlers which paths they may or may not request, and it has been a recognized web standard for decades. llms.txt is a content file: a markdown document that points AI models to your most important pages and explains what your site is about. robots.txt controls access; llms.txt offers a guide. They solve different problems, so you generally want both rather than choosing one.

Does llms.txt replace robots.txt?

No. llms.txt does not block, allow, or restrict anything, so it cannot do robots.txt's job of controlling which paths a crawler may request. robots.txt does not describe your content or curate your best URLs, so it cannot do llms.txt's job. They are complementary. Keep robots.txt for crawl control and add llms.txt as an optional, model-friendly map of your key pages.

Do AI crawlers actually obey these files?

robots.txt is voluntary, but the major AI crawlers publish their user-agent names and honor it, so it is the reliable way to allow or disallow specific bots like GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. llms.txt is newer and adoption is uneven, so treat it as a low-cost helper rather than a guarantee. Neither file changes whether a model already trained on public data knows about you.

Italo Campilii

Founder, Acromatico · runs AI visibility & brand systems across 10+ live brands

— Italo & Ale

written from the studio floor · developed in the darkroom

Want this done for you?

Not sure if you are blocking the crawlers you want? Start with an AI visibility audit.

Get a free AI Visibility Audit →

AI VisibilityWhat Is llms.txt and Do You Need It?AI Visibilityllms.txt Setup Walkthrough