What is robots.txt used for?

Robots.txt tells search engine crawlers which parts of your website they may request. It lives at the root of your domain (example.com/robots.txt) and is the first file a crawler checks before visiting any page. It is used to keep crawlers out of low-value areas like internal search results, cart pages, and admin paths — preserving crawl budget for the pages you want indexed.

Does robots.txt remove pages from Google?

No — this is the most common robots.txt misconception. Disallowing a page only stops Google from crawling it; the URL can still appear in search results (without a description) if other sites link to it. To remove a page from Google, allow it to be crawled and add a noindex meta tag or X-Robots-Tag header, or use the removal tool in Google Search Console.

What happens if I do not have a robots.txt file?

Nothing breaks. If a crawler requests /robots.txt and gets a 404, it assumes everything is allowed and crawls normally. A missing robots.txt is completely fine for small sites. A misconfigured robots.txt, on the other hand, can deindex your entire site — so having no file is safer than having a wrong one.

Can robots.txt hurt my SEO?

Yes, badly — a single line ("Disallow: /") can block your whole site from being crawled, and blocking CSS or JavaScript files can prevent Google from rendering your pages correctly, hurting rankings. Always test changes in Google Search Console before deploying, and audit your robots.txt whenever organic traffic drops unexpectedly.

Should I block AI crawlers like GPTBot in robots.txt?

It is a business decision, not an SEO one. Blocking GPTBot, ClaudeBot, or CCBot stops your content being used for AI training, but it may also exclude you from AI search experiences that cite sources and send referral traffic. Many publishers now allow AI search crawlers while blocking pure training crawlers.

seo11 min read

Robots.txt for SEO: The Complete Guide for 2026

Learn how robots.txt works, what to block (and what never to block), and the syntax mistakes that silently deindex websites. Includes copy-paste templates for every common platform.

June 5, 2026

robots.txtrobots.txt SEOrobots.txt syntaxdisallow robots.txtrobots.txt best practicescrawl budgetrobots.txt generatorrobots.txt example

Robots.txt for SEO: The Complete Guide for 2026

One line in robots.txt can deindex your entire website. It has happened to billion-dollar companies: a developer ships a staging configuration to production, Disallow: / goes live, and organic traffic falls off a cliff over the following weeks.

This guide covers how robots.txt actually works (including the parts most tutorials get wrong), what to block and what never to block, and the exact templates to use for common platforms.

What Robots.txt Does (and Doesn't Do)

Robots.txt is a plain text file at the root of your domain — https://example.com/robots.txt — that implements the Robots Exclusion Protocol. Before a well-behaved crawler requests any page on your site, it fetches this file and checks whether the URL it wants is allowed.

Three things robots.txt does:

Controls crawling — which URLs search engine bots may request
Preserves crawl budget — keeps bots out of infinite URL spaces (filters, calendars, search results)
Declares your sitemap — points crawlers to your XML sitemap location

Three things robots.txt does not do:

It does not remove pages from the index. A disallowed URL can still rank if other pages link to it — Google just shows it without a snippet.
It does not protect private content. The file is public, and bad bots ignore it. Anyone can read example.com/robots.txt and see exactly which paths you tried to hide.
It does not pass or block link equity. Crawl directives and indexing signals are separate systems.

The golden rule: robots.txt controls crawling. The noindex meta tag controls indexing. Mixing them up is the source of nearly every robots.txt disaster — including the classic mistake of disallowing a page and adding noindex to it. If Google can't crawl the page, it never sees the noindex.

Robots.txt Syntax, Line by Line

A robots.txt file is a set of rule groups. Each group starts with one or more User-agent lines and is followed by Allow and Disallow rules:

# Group 1: rules for every crawler
User-agent: *
Disallow: /admin/
Disallow: /cart
Allow: /admin/public-docs/
 
# Group 2: rules only for Googlebot
User-agent: Googlebot
Disallow: /experiments/
 
Sitemap: https://example.com/sitemap.xml

Directive	What it does	Notes
`User-agent`	Names the crawler the group applies to	`` matches all bots; a bot uses the most specific* group that matches it, not all groups
`Disallow`	Blocks URLs starting with this path	Empty value (`Disallow:`) means allow everything
`Allow`	Re-allows a sub-path inside a disallowed area	The more specific (longer) rule wins
`Sitemap`	Declares your XML sitemap URL	Must be an absolute URL; can appear multiple times
`Crawl-delay`	Asks bots to wait between requests	Ignored by Google; respected by some other bots

Wildcards and Anchors

Two pattern characters are supported by all major search engines:

* matches any sequence of characters
$ anchors the pattern to the end of the URL

# Block every URL containing a query string
Disallow: /*?
 
# Block all PDFs
Disallow: /*.pdf$
 
# Block paginated archive pages beyond page 1
Disallow: /blog/page/
 
# Block internal search results
Disallow: /search
Disallow: /*?s=

The Specificity Trap

When Allow and Disallow rules conflict, the longest matching rule wins — not the first one:

User-agent: *
Disallow: /downloads/
Allow: /downloads/whitepaper.pdf

Here /downloads/whitepaper.pdf is crawlable because the Allow rule (25 characters) is longer than the Disallow rule (11 characters). This is also why Allow: / never overrides a more specific Disallow.

What You Should (Usually) Block

Every site is different, but these URL spaces are almost always safe — and beneficial — to disallow:

Internal search results (/search, /*?s=) — infinite, thin, duplicate content
Faceted navigation and filters (/*?color=, /*?sort=) — the #1 crawl budget killer on e-commerce sites
Cart, checkout, and account pages (/cart, /checkout, /account) — no search value, often session-specific
Admin and login paths (/wp-admin/, /admin/) — though remember this is visibility control, not security
Tracking and campaign URLs (/*?utm_) — duplicate content under infinite parameter variations
Staging or preview paths that share the production domain (/preview/, /draft/)

What You Must Never Block

These mistakes are common and expensive:

CSS and JavaScript files. Google renders pages like a browser. If it can't fetch your CSS/JS, it may see a broken page and rank you accordingly. Blocking /wp-includes/ or /assets/ was standard advice in 2012 — today it actively hurts you.
Pages you want deindexed. Counterintuitive, but as covered above: Google has to crawl a page to see its noindex tag.
Your entire site with Disallow: / — verify this isn't in production right now. Seriously, go check. We'll wait.
Image, font, or media folders your visible pages depend on — blocking them degrades how Google renders and understands your pages and removes you from image search.

Copy-Paste Templates

A Sensible Default for Most Websites

User-agent: *
Disallow: /admin/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /search
Disallow: /*?s=
Disallow: /*?utm_
 
Sitemap: https://example.com/sitemap.xml

WordPress

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /?s=
Disallow: /search/
 
Sitemap: https://example.com/sitemap_index.xml

Note the Allow for admin-ajax.php — many themes and plugins load front-end content through it.

Next.js / Modern JS Frameworks

User-agent: *
Disallow: /api/
Disallow: /_next/static/chunks/pages/admin
 
Sitemap: https://example.com/sitemap.xml

Don't block /_next/static/ wholesale — that's where your CSS and JS live. In Next.js you can generate this file dynamically with a robots.ts in your app/ directory.

Handling AI Crawlers

# Block AI training crawlers
User-agent: GPTBot
Disallow: /
 
User-agent: CCBot
Disallow: /
 
# Allow AI search crawlers that cite and link sources
User-agent: OAI-SearchBot
Allow: /
 
User-agent: PerplexityBot
Allow: /

Whether to block AI crawlers is a business decision — see the FAQ below for the trade-offs.

How to Test Your Robots.txt

Never deploy robots.txt changes blind:

Google Search Console → Settings → robots.txt shows the last fetched version, any parse errors, and lets you request a recrawl after fixing issues.
The URL Inspection tool tells you whether a specific URL is blocked by robots.txt — test your money pages after every change.
Run a full site scan. A technical SEO audit catches robots.txt problems alongside the issues that usually travel with them — missing sitemaps, noindex conflicts, and orphaned pages. WebScore's SEO module checks your robots.txt configuration on every scan, free.

Robots.txt Debugging Checklist

When organic traffic drops and you suspect robots.txt:

Fetch yourdomain.com/robots.txt directly — does it return 200 with the content you expect?
Check for Disallow: / under User-agent: * (the staging-config-in-production classic)
Confirm CSS/JS paths aren't blocked (test a page in Search Console's URL Inspection → View crawled page)
Verify the file is under 500 KB (Google's limit — rules beyond it are ignored)
Check the file returns 200, not 5xx — if robots.txt returns a server error, Google may stop crawling your site entirely
Look for conflicting noindex + Disallow combinations on pages you're trying to remove

Key Takeaways

Robots.txt controls crawling, not indexing — use noindex to remove pages from search
The longest matching rule wins, not the first one
Never block CSS/JS; almost always block internal search, filters, and cart/checkout paths
A broken robots.txt fails silently — traffic erodes over weeks, not overnight
Test every change in Search Console before and after deploying

Not sure what your robots.txt is doing right now? Run a free WebScore scan — it checks your robots.txt, sitemap, crawlability, and 100+ other SEO factors in under 60 seconds, and shows you exactly what to fix.

seo

Scan Your Website Now

Get a comprehensive analysis of your website's performance, SEO, security, and more.