用户代理字符串是否必须与我的服务器日志中显示的完全相同?

发布于 2024-10-11 21:53:01 字数 160 浏览 4 评论 0原文

使用 Robots.txt 文件时,用户代理字符串是否必须与服务器日志中显示的完全相同?

例如,当尝试匹配 GoogleBot 时,我可以只使用 googlebot 吗?

另外,部分匹配有效吗?例如,仅使用 Google

When using a Robots.txt file, does the user agent string have to be exactly as it appears in my server logs?

For example when trying to match GoogleBot, can I just use googlebot?

Also, will a partial-match work? For example just using Google?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

豆芽 2024-10-18 21:53:01

至少对于 googlebot 来说,用户代理不区分大小写。阅读“用户代理的优先顺序”部分:

https: //code.google.com/intl/de/web/controlcrawlindex/docs/robots_txt.html

At least for googlebot, the user-agent is non-case-sensitive. Read the 'Order of precedence for user-agents' section:

https://code.google.com/intl/de/web/controlcrawlindex/docs/robots_txt.html

苦笑流年记忆 2024-10-18 21:53:01

(正如已经在另一个问题中回答的

原始 robots.txt 规范(从 1994 年开始),它说:

用户代理

[…]

机器人应该自由地解释这个领域。建议对不带版本信息的名称进行不区分大小写的子字符串匹配。

[…]

但是如果/哪些解析器像那样工作是另一个问题。最好的选择是查找要添加的机器人的文档。您通常会在其中找到代理标识符字符串,例如:

  • 必应

    <块引用>

    我们希望网站站长知道 bingbot 仍会遵循为 msnbot 编写的 robots.txt 指令,因此无需对您的 robots.txt 文件进行任何更改。

  • DuckDuckGo

    <块引用>

    DuckDuckBot 是 DuckDuckGo 的网络爬虫。它尊重 WWW::RobotRules [...]

  • Google

    <块引用>

    Google 用户代理是(足够恰当的)Googlebot

  • 互联网档案

    <块引用>

    用户代理archive.org_bot用于我们对网络的广泛抓取。它的设计尊重 robots.txt 和 META 机器人标签。

(As already answered in another question)

In the original robots.txt specification (from 1994), it says:

User-agent

[…]

The robot should be liberal in interpreting this field. A case insensitive substring match of the name without version information is recommended.

[…]

But if/which parsers work like that is another question. Your best bet would be to look for the documentation of the bots you want to add. You’ll typically find the agent identifier string in it, e.g.:

  • Bing:

    We want webmasters to know that bingbot will still honor robots.txt directives written for msnbot, so no change is required to your robots.txt file(s).

  • DuckDuckGo:

    DuckDuckBot is the Web crawler for DuckDuckGo. It respects WWW::RobotRules […]

  • Google:

    The Google user-agent is (appropriately enough) Googlebot.

  • Internet Archive:

    User Agent archive.org_bot is used for our wide crawl of the web. It is designed to respect robots.txt and META robots tags.

野心澎湃 2024-10-18 21:53:01

robots.txt 区分大小写,尽管 Google 比其他机器人更为保守,并且可能以任何方式接受其字符串,但其他机器人可能不会。

robots.txt is case-sensitive, although Google is more conservative than other bots, and may accept its string either way, other bots may not.

剩余の解释 2024-10-18 21:53:01

另外,部分匹配有效吗?例如仅使用 Google

理论上是的。然而,实际上,它似乎是特定的部分匹配或“子字符串”(如@unor的答案中提到的)匹配。这些特定的“子串”似乎被称为“令牌”。通常它必须与这些“令牌”完全匹配

对于标准 Googlebot,这似乎仅与 Googlebot 匹配(不区分大小写)。任何较小的部分匹配(例如 Google)都无法匹配。任何较长的部分匹配(例如 Googlebot/1.2)都将无法匹配。并且使用完整的用户代理字符串(Mozilla/5.0(兼容;Googlebot/2.​​1;+http://www.google.com/bot.html))也无法匹配。(尽管有从技术上讲,Googlebot 无论如何都有多个用户代理,因此无论如何都不建议匹配完整的用户代理字符串 - 即使它确实有效。)

这些测试是使用 Google 的 robots.txt 测试器

参考:

Also, will a partial-match work? For example just using Google?

In theory, yes. However, in practise it seems to be specific partial-matches or "substrings" (as mentioned in @unor's answer) that match. These specific "substrings" appear to be referred to as "tokens". And often it must be an exact match for these "tokens".

With regards to the standard Googlebot, this only appears to match Googlebot (case-insensitive). Any lesser partial-match, such as Google, fails to match. Any longer partial-match, such as Googlebot/1.2, fails to match. And using the full user-agent string (Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) also fails to match. (Although there is technically more than one user-agent for the Googlebot anyway, so matching on the full user-agent string would not be recommended anyway - even if it did work.)

These tests were performed with Google's robots.txt tester.

Reference:

沒落の蓅哖 2024-10-18 21:53:01

是的,用户代理必须完全匹配。

来自 robotstxt.org:“用户代理或禁止中均不支持通配符和正则表达式线”

Yes, the user agent has to be an exact match.

From robotstxt.org: "globbing and regular expression are not supported in either the User-agent or Disallow lines"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文