Skip to content
WebScore LogoWebScore
seo11 min read

Robots.txt for SEO: The Complete Guide for 2026

Learn how robots.txt works, what to block (and what never to block), and the syntax mistakes that silently deindex websites. Includes copy-paste templates for every common platform.

June 5, 2026
robots.txtrobots.txt SEOrobots.txt syntaxdisallow robots.txtrobots.txt best practicescrawl budgetrobots.txt generatorrobots.txt example

One line in robots.txt can deindex your entire website. It has happened to billion-dollar companies: a developer ships a staging configuration to production, Disallow: / goes live, and organic traffic falls off a cliff over the following weeks.

This guide covers how robots.txt actually works (including the parts most tutorials get wrong), what to block and what never to block, and the exact templates to use for common platforms.

What Robots.txt Does (and Doesn't Do)

Robots.txt is a plain text file at the root of your domain — https://example.com/robots.txt — that implements the Robots Exclusion Protocol. Before a well-behaved crawler requests any page on your site, it fetches this file and checks whether the URL it wants is allowed.

Three things robots.txt does:

  • Controls crawling — which URLs search engine bots may request
  • Preserves crawl budget — keeps bots out of infinite URL spaces (filters, calendars, search results)
  • Declares your sitemap — points crawlers to your XML sitemap location

Three things robots.txt does not do:

  • It does not remove pages from the index. A disallowed URL can still rank if other pages link to it — Google just shows it without a snippet.
  • It does not protect private content. The file is public, and bad bots ignore it. Anyone can read example.com/robots.txt and see exactly which paths you tried to hide.
  • It does not pass or block link equity. Crawl directives and indexing signals are separate systems.

The golden rule: robots.txt controls crawling. The noindex meta tag controls indexing. Mixing them up is the source of nearly every robots.txt disaster — including the classic mistake of disallowing a page and adding noindex to it. If Google can't crawl the page, it never sees the noindex.

Robots.txt Syntax, Line by Line

A robots.txt file is a set of rule groups. Each group starts with one or more User-agent lines and is followed by Allow and Disallow rules:

# Group 1: rules for every crawler
User-agent: *
Disallow: /admin/
Disallow: /cart
Allow: /admin/public-docs/
 
# Group 2: rules only for Googlebot
User-agent: Googlebot
Disallow: /experiments/
 
Sitemap: https://example.com/sitemap.xml
DirectiveWhat it doesNotes
User-agentNames the crawler the group applies to* matches all bots; a bot uses the most specific group that matches it, not all groups
DisallowBlocks URLs starting with this pathEmpty value (Disallow:) means allow everything
AllowRe-allows a sub-path inside a disallowed areaThe more specific (longer) rule wins
SitemapDeclares your XML sitemap URLMust be an absolute URL; can appear multiple times
Crawl-delayAsks bots to wait between requestsIgnored by Google; respected by some other bots

Wildcards and Anchors

Two pattern characters are supported by all major search engines:

  • * matches any sequence of characters
  • $ anchors the pattern to the end of the URL
# Block every URL containing a query string
Disallow: /*?
 
# Block all PDFs
Disallow: /*.pdf$
 
# Block paginated archive pages beyond page 1
Disallow: /blog/page/
 
# Block internal search results
Disallow: /search
Disallow: /*?s=

The Specificity Trap

When Allow and Disallow rules conflict, the longest matching rule wins — not the first one:

User-agent: *
Disallow: /downloads/
Allow: /downloads/whitepaper.pdf

Here /downloads/whitepaper.pdf is crawlable because the Allow rule (25 characters) is longer than the Disallow rule (11 characters). This is also why Allow: / never overrides a more specific Disallow.

What You Should (Usually) Block

Every site is different, but these URL spaces are almost always safe — and beneficial — to disallow:

  1. Internal search results (/search, /*?s=) — infinite, thin, duplicate content
  2. Faceted navigation and filters (/*?color=, /*?sort=) — the #1 crawl budget killer on e-commerce sites
  3. Cart, checkout, and account pages (/cart, /checkout, /account) — no search value, often session-specific
  4. Admin and login paths (/wp-admin/, /admin/) — though remember this is visibility control, not security
  5. Tracking and campaign URLs (/*?utm_) — duplicate content under infinite parameter variations
  6. Staging or preview paths that share the production domain (/preview/, /draft/)

What You Must Never Block

These mistakes are common and expensive:

  • CSS and JavaScript files. Google renders pages like a browser. If it can't fetch your CSS/JS, it may see a broken page and rank you accordingly. Blocking /wp-includes/ or /assets/ was standard advice in 2012 — today it actively hurts you.
  • Pages you want deindexed. Counterintuitive, but as covered above: Google has to crawl a page to see its noindex tag.
  • Your entire site with Disallow: / — verify this isn't in production right now. Seriously, go check. We'll wait.
  • Image, font, or media folders your visible pages depend on — blocking them degrades how Google renders and understands your pages and removes you from image search.

Copy-Paste Templates

A Sensible Default for Most Websites

User-agent: *
Disallow: /admin/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /search
Disallow: /*?s=
Disallow: /*?utm_
 
Sitemap: https://example.com/sitemap.xml

WordPress

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /?s=
Disallow: /search/
 
Sitemap: https://example.com/sitemap_index.xml

Note the Allow for admin-ajax.php — many themes and plugins load front-end content through it.

Next.js / Modern JS Frameworks

User-agent: *
Disallow: /api/
Disallow: /_next/static/chunks/pages/admin
 
Sitemap: https://example.com/sitemap.xml

Don't block /_next/static/ wholesale — that's where your CSS and JS live. In Next.js you can generate this file dynamically with a robots.ts in your app/ directory.

Handling AI Crawlers

# Block AI training crawlers
User-agent: GPTBot
Disallow: /
 
User-agent: CCBot
Disallow: /
 
# Allow AI search crawlers that cite and link sources
User-agent: OAI-SearchBot
Allow: /
 
User-agent: PerplexityBot
Allow: /

Whether to block AI crawlers is a business decision — see the FAQ below for the trade-offs.

How to Test Your Robots.txt

Never deploy robots.txt changes blind:

  1. Google Search Console → Settings → robots.txt shows the last fetched version, any parse errors, and lets you request a recrawl after fixing issues.
  2. The URL Inspection tool tells you whether a specific URL is blocked by robots.txt — test your money pages after every change.
  3. Run a full site scan. A technical SEO audit catches robots.txt problems alongside the issues that usually travel with them — missing sitemaps, noindex conflicts, and orphaned pages. WebScore's SEO module checks your robots.txt configuration on every scan, free.

Robots.txt Debugging Checklist

When organic traffic drops and you suspect robots.txt:

  • Fetch yourdomain.com/robots.txt directly — does it return 200 with the content you expect?
  • Check for Disallow: / under User-agent: * (the staging-config-in-production classic)
  • Confirm CSS/JS paths aren't blocked (test a page in Search Console's URL Inspection → View crawled page)
  • Verify the file is under 500 KB (Google's limit — rules beyond it are ignored)
  • Check the file returns 200, not 5xx — if robots.txt returns a server error, Google may stop crawling your site entirely
  • Look for conflicting noindex + Disallow combinations on pages you're trying to remove

Key Takeaways

  • Robots.txt controls crawling, not indexing — use noindex to remove pages from search
  • The longest matching rule wins, not the first one
  • Never block CSS/JS; almost always block internal search, filters, and cart/checkout paths
  • A broken robots.txt fails silently — traffic erodes over weeks, not overnight
  • Test every change in Search Console before and after deploying

Not sure what your robots.txt is doing right now? Run a free WebScore scan — it checks your robots.txt, sitemap, crawlability, and 100+ other SEO factors in under 60 seconds, and shows you exactly what to fix.

Related Articles

Scan Your Website Now

Get a comprehensive analysis of your website's performance, SEO, security, and more.