在robots.txt中不允许目录时,我应该使用尾随的斜线吗?

发布于 2025-01-26 03:27:02 字数 187 浏览 2 评论 0原文

我想禁止在robots.txt 中使用目录/acct的爬网我应该使用哪个规则?

disallow:/acctdisallow:/acct/

acct都包含子直销和文件。拖尾的效果是什么?

I want to disallow crawling of a directory /acct in robots.txt Which rule should I use?

Disallow: /acct or Disallow: /acct/

acct contains sub-directories and files both. What is the effect of a trailing slash?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

谁的年少不轻狂 2025-02-02 03:27:02

由于robots.txt规则都是“开头”规则,因此您提出的两个规则都会禁止以下内容:

  • https://example.com/acct/
  • https://example.com/acct/foo
  • https://example.com/acct/bar

但是,只有以下规则不允许,而无需拖延斜线:

  • <代码> https://example.com/acct
  • https://example.com/acct.html
  • https://example.com/acctbar

<代码>禁止:/acct/通常会更好,因为没有任何不承受意外URL的风险。但是,它不会阻止/acct的爬行。

在大多数情况下,Web服务器重定向目录URL,而无需拖曳斜线以添加后斜线。在您的服务器上,https://example.com/acct重定向到https://example.com/acct/。如果是这种情况,通常可以允许bot爬网/acct而没有拖延斜线并查看重定向是可以的。他们将被阻止爬行重定向的目标。

Since robots.txt rules are all "starts with" rules, both of your proposed rules would disallow the following:

  • https://example.com/acct/
  • https://example.com/acct/foo
  • https://example.com/acct/bar

However, the following would only be disallowed by the rule without the trailing slash:

  • https://example.com/acct
  • https://example.com/acct.html
  • https://example.com/acctbar

Disallow: /acct/ is usually better because there is no risk of disallowing unexpected URLs. However, it does NOT prevent crawling of /acct.

In most cases web servers redirect directory URLs without a trailing slash to add the trailing slash. It is likely that on your server, https://example.com/acct redirects to https://example.com/acct/. If that is the case, it is usually fine to allow bots to crawl /acct with no trailing slash and see the redirect. They would be blocked from crawling the target of the redirect.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文