18 min readTechnical Guide

claude searchbot robots.txt guide for AI-search teams

claude searchbot robots.txt controls whether Anthropic can index your pages for Claude's search responses, which is a different decision from allowing model training or user-triggered page fetches. Anthropic's February 2026 three-bot split means most publishers should review ClaudeBot, Claude-User, and Claude-SearchBot separately instead of treating Anthropic as one crawler.

claude searchbot robots.txt guide for controlling Anthropic crawling, preserving Claude search visibility, and separating training from retrieval.

Laptop analytics dashboard used to explain claude searchbot robots.txt decisions and AI-search visibility
Claude visibility decisions now sit closer to crawl policy than most teams realize. Photo: Tiger Lily via Pexels.

claude searchbot robots.txt matters because Anthropic no longer asks publishers to make one blanket decision about "Claude crawling." In the company's updated crawler documentation, ClaudeBot, Claude-User, and Claude-SearchBot each have different jobs and different consequences when you block them. That change turns a vague AI crawler policy into a technical SEO decision: do you want to opt out of model training, preserve live retrieval, keep visibility inside Claude search, or split those goals on purpose?

Search Roost already covers the broader mechanics of robots.txt for SEO, the operational role of X-Robots-Tag headers, and the publisher-facing case for llms.txt. What most teams still need is the platform-specific translation: how Anthropic's bots map to training, search quality, and user retrieval, and what the safest robots policy looks like if you want Claude visibility without granting every form of access.

What does claude searchbot robots.txt actually control?

Anthropic describes Claude-SearchBot as the crawler that navigates the web to improve search result quality for users. That wording is important because it means the bot is not framed as a generic training crawler. It exists to help Claude's search responses find and understand public web content well enough to serve better answers. In practice, that puts Claude-SearchBot much closer to a search-indexing decision than to a model-training decision.

Anthropic also states that disabling Claude-SearchBot prevents its system from indexing your content for search optimization and may reduce your site's visibility and accuracy in user search results. That is the clearest operational clue publishers have received so far. If your commercial goal is to appear in Claude's answer workflow, then your `User-agent: Claude-SearchBot` rules belong in the same review process as crawl access, sitemaps, and important page discovery, not just in a legal or AI-policy checklist.

The main misunderstanding to avoid is assuming that a block on ClaudeBot automatically removes you from every Claude surface. Anthropic's own documentation now makes the opposite point. ClaudeBot controls future training-data collection, Claude-User controls user-directed retrieval, and Claude-SearchBot controls the search-quality layer. That split is why a modern Anthropic policy can be precise instead of all-or-nothing.

ControlWhat It ChangesRisk If Misconfigured
`Claude-SearchBot`Search indexing and result-quality crawlingLost Claude search visibility
`ClaudeBot`Future model-training collectionUnwanted training access or accidental overblocking
`Claude-User`Live fetches tied to a user's requestFewer user-directed citations or page fetches

That precision is the real reason this topic has become such a live technical SEO question in 2026. The problem is no longer whether Anthropic has a crawler. The problem is whether your current file still reflects the new crawler map.

How is Claude-SearchBot different from ClaudeBot and Claude-User?

Anthropic's support article now gives publishers a three-way model. ClaudeBot collects public web content that could contribute to training. Claude-User supports direct user requests inside Claude, which means the system may fetch your page when a person asks about it or supplies a URL. Claude-SearchBot is the search-quality bot that helps Claude find and index pages for better search responses.

This matters because each decision carries a different tradeoff. If you block ClaudeBot, Anthropic says future site materials should be excluded from model-training datasets. If you block Claude-User, Anthropic says your site may become less visible for user-directed web search. If you block Claude-SearchBot, Anthropic says your content may not be indexed for search optimization, reducing visibility and result accuracy. Those are three separate policies, not one.

ClaudeBot is the training question

This is the bot most policy documents were written for in 2024 and 2025. Teams that wanted to limit AI training access often added a single `User-agent: ClaudeBot` block and assumed the job was done. That block still answers the training question, but it does not answer the search-visibility question anymore. If your goal is training opt-out only, the rule may still be correct. If your goal is broader, it is incomplete.

Claude-User is the retrieval question

Claude-User is closer to a browser-like fetch triggered by human intent. Anthropic says disabling it prevents the system from retrieving your content in response to a user query, which may reduce visibility for user-directed web search. That makes it especially relevant for pages people cite directly in prompts, such as documentation, knowledge-base articles, research explainers, and high-intent commercial guides.

Claude-SearchBot is the visibility question

Claude-SearchBot determines whether Anthropic can index your site for better search response quality. If you publish pages that you want surfaced in Claude's web-search workflow, this is the bot you should treat as a visibility gatekeeper. The same logic already applies elsewhere on the site in our guides to ChatGPT search ranking factors and Bing AI performance reporting: separate the retrieval surface from the training surface before you decide what to block.

Blocking ClaudeBot answers a model-training policy question. Blocking Claude-SearchBot answers a search-visibility policy question. Those are no longer the same choice.
Team whiteboard session planning a claude searchbot robots.txt policy for separate Anthropic crawler access
The safest Anthropic policy starts by separating training, retrieval, and search visibility before anyone edits robots.txt. Photo: Walls.io via Pexels.

When should you allow Claude-SearchBot but block ClaudeBot?

This is the configuration many publishers were missing before Anthropic clarified the split. If your business wants Claude visibility but does not want future site content entering model training, the cleanest starting point is often to disallow ClaudeBot while keeping Claude-SearchBot and Claude-User available. That setup does not guarantee citations, but it removes one common self-inflicted blocker.

The case for this split is strongest on editorial sites, SaaS documentation hubs, comparison pages, and resource centers where visibility in answer engines matters more than participation in training corpora. It is also useful for teams already investing in answer-first page structure, schema accuracy, and topical clusters. If you are doing the work outlined in our writing for AI answers framework and technical SEO checklist, you usually do not want a stray robots rule to cancel that work.

A practical split for visibility-first publishers

A visibility-first policy usually says yes to search indexing and yes to live retrieval, but no to training collection. Anthropic's documentation now supports exactly that distinction. The robots.txt pattern is simple:

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Claude-User
Allow: /

The strategic value of that pattern is clarity. Your legal or brand team can keep a training opt-out. Your SEO team can still preserve access for user retrieval and Claude search indexing. Your dev team no longer has to guess whether one rule is doing three jobs.

When a full block still makes sense

Some sites will still want to disallow all three bots. Private communities, gated SaaS apps, paid archives, regulated content, and low-value infrastructure hosts may prefer a full block because the visibility upside is weak or the access risk is too high. The point is not that every site should allow Claude-SearchBot. The point is that the decision should be explicit now that the bots are explicit.

Policy GoalClaudeBotClaude-SearchBotClaude-User
Maximum Claude visibilityAllowAllowAllow
Visibility without trainingBlockAllowAllow
Full Anthropic blockBlockBlockBlock

Does Claude-SearchBot respect robots.txt and crawl-delay?

According to Anthropic, yes. The crawler documentation says Anthropic's bots honor industry-standard directives in robots.txt and respect anti-circumvention technologies like CAPTCHAs. It also says Anthropic supports the non-standard `Crawl-delay` extension to robots.txt. That is notable because not every large crawler handles `Crawl-delay` the same way.

Google is the clean comparison point here. Google's own robots.txtdocumentation explicitly says that fields like `crawl-delay` are not supported by Googlebot. Anthropic, by contrast, says its bots do support `Crawl-delay`. For overloaded sites, that creates a meaningful middle ground between full access and full block.

User-agent: Claude-SearchBot
Allow: /
Crawl-delay: 1

That said, do not turn `Crawl-delay` into a magical fix. It helps pace a cooperative crawler. It does not solve wildcard blocks, CDN-level bot filtering, malformed files, or stale assumptions about what your current robots.txt actually says. Treat it as rate control, not as policy logic.

Blue server rack image illustrating claude searchbot robots.txt crawl management and infrastructure load
If you need to slow Anthropic crawling, its docs point to `Crawl-delay`, not to a blind visibility cut-off. Photo: cookiecutter via Pexels.

What robots.txt patterns work best for most teams?

Most teams only need one of three patterns: allow all Anthropic access, block training only, or block everything. Complexity starts creeping in when old rules, wildcard rules, and CMS-generated files overlap. The safest way to handle that is to choose one policy, write it plainly, and keep Anthropic's user agents explicit.

Pattern 1: allow all three bots

This is the default visibility-first stance for publishers who are comfortable with training, search indexing, and user retrieval. Because many sites already allow `User-agent: *`, this may be your practical state even if you never wrote Anthropic-specific lines. Adding the explicit user agents can still help prevent future accidental overrides.

User-agent: ClaudeBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Claude-User
Allow: /

Pattern 2: block training, keep visibility

This is usually the most balanced policy for commercial content teams that care about AI-search presence but want cleaner training boundaries. It keeps Claude search and live retrieval open while reserving a clear no on future training collection.

Pattern 3: block all Anthropic bots

This is the strictest policy and easiest to understand internally. Just remember what you are giving up: reduced or removed Claude search visibility and fewer user-triggered fetches. If the site is public and marketing-driven, that tradeoff deserves a deliberate decision rather than a leftover security default.

Anthropic also warns against relying on IP blocking as your primary opt-out method because it can interfere with the crawler's ability to read robots.txt correctly. That is consistent with a broader technical SEO lesson: the most durable policy is the one the bot is documented to read first.

How do you test and monitor a claude searchbot robots.txt change?

Start with the file itself. Confirm that the live `robots.txt` is on the correct host, protocol, and subdomain, because robots rules only apply to the host where the file is published. Google's documentation makes this point clearly, and the same habit is worth keeping for Anthropic decisions too. If your app serves different hosts from different stacks, verify all of them.

Review the rendered production file, not the source repo only

This matters in frameworks where `robots.txt` is generated at build time. A correct source file is not enough if middleware, hosting rules, or platform defaults change the live response. Load the production URL, not just the local file. Then check that the Anthropic sections appear exactly as intended.

Check server logs for user-agent behavior

If Claude-SearchBot or Claude-User traffic matters to you, logs are still the fastest proof that the crawler is seeing your policy and revisiting accordingly. That fits the same operational pattern as our log file analysis guide: document the before state, ship the rule, then watch whether crawl requests change over the next few days and weeks.

Audit other blockers outside robots.txt

A correct robots file does not solve everything. CDN bot controls, WAF rules, authentication walls, rate limits, and preview-tag choices can still change what answer engines can see or use. If the page is meant to rank in AI search, pair the crawler policy with the page-quality and content checks in the Search Console workflow and the Anthropic API guide, especially if your audience uses Claude directly.

Presenter auditing whiteboard notes for a claude searchbot robots.txt rollout and crawler QA checklist
Anthropic crawler changes should be logged like any other SEO release: policy, timestamp, expected effect, and log review. Photo: Pavel Danilyuk via Pexels.

Which mistakes make sites accidentally invisible in Claude search?

The first mistake is assuming an old `ClaudeBot` rule still covers everything. That was never clean, and after the February 2026 documentation update it is definitely wrong. The second mistake is letting `User-agent: *` or platform-level bot controls quietly override the policy you think you have. The third is confusing crawling with indexing and visibility. Google's documentation remains useful here: robots.txt controls crawl access, while `noindex` controls whether pages should appear in search surfaces.

Anthropic adds one more twist. Its blocking-and-removal guidance says the `noindex` meta tag tells its search partners not to index your content for Claude web-search outputs. That means you should not talk about Claude visibility using robots.txt alone. A page can be crawlable and still be intentionally removed from search-facing use if your indexing directives are restrictive.

The last mistake is policy drift. Teams change vendors, migrate CDNs, add generated route handlers, or copy an old robots template without revisiting the new crawler landscape. The result is usually not a dramatic penalty. It is quieter than that: pages simply fail to become available where the team expected them to appear. In the AI-search era, invisible is often just misconfigured.

FAQ: claude searchbot robots.txt