robots.txt：用户代理：Googlebot不允许：/ Google仍在索引

发布于 2024-10-14 01:39:59 字数 240 浏览 7 评论 0原文

看一下这个网站的robots.txt：

内容是：

User-Agent: Googlebot
Disallow: /

那应该告诉谷歌不要索引该网站，不是吗？

如果属实，为什么该网站会出现在谷歌搜索中？

原文

Look at the robots.txt of this site:

fr2.dk/robots.txt

The content is:

User-Agent: Googlebot
Disallow: /

That ought to tell google not to index the site, no?

If true, why does the site appear in google searches?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

携君以终年 2024-10-21 01:39:59

除了必须等待之外，因为 Google 的索引更新需要一些时间，还要注意，如果有其他网站链接到您的网站，仅靠 robots.txt 不足以删除您的网站。

引用 Google 的支持页面 “从 Google 搜索结果中删除网页或网站”：

如果该网页仍然存在，但您不希望它出现在搜索结果中，请使用 robots.txt 阻止 Google 抓取它。请注意，一般来说，即使 robots.txt 不允许某个 URL，如果我们在其他网站上找到该页面的 URL，我们仍可能会对该页面建立索引。不过，如果该网页在 robots.txt 中被屏蔽，并且存在针对该网页的主动删除请求，则 Google 不会将该网页编入索引。

上述文档中还提到了一种可能的替代解决方案：

或者，您可以使用 noindex 元标记。当我们在页面上看到此标签时，Google 会将该页面从我们的搜索结果中完全删除，即使其他页面链接到该页面也是如此。如果您无法直接访问站点服务器，这是一个很好的解决方案。（您需要能够编辑页面的 HTML 源代码）。

回复收藏 0 原文

太阳公公是暖光 2024-10-21 01:39:59

我可以确认 Google 不尊重机器人排除文件。这是我的文件，是我在将此源放到网上之前创建的：

https://git.habd.as/robots .txt

以及文件的完整内容：

User-agent: *
Disallow:

User-agent: Google
Disallow: /

Google 仍然将其编入索引。

去年 3 月取消我的帐户后，我不再使用 Google，也从未将此网站添加到 Yandex 之外的网站管理员控制台，这让我有两个假设：

Google 正在抓取 Yandex
Google 不尊重机器人排除标准

我还没有 grep 我的网站日志尚未，但我会的，我的假设是我会发现谷歌蜘蛛在那里行为不端。

I can confirm Google doesn't respect the Robots Exclusion File. Here's my file, which I created before putting this origin online:

https://git.habd.as/robots.txt

And the full contents of the file:

User-agent: *
Disallow:

User-agent: Google
Disallow: /

And Google still indexed it.

I don't use Google after cancelling my account last March and never had this site added to a webmaster console outside Yandex which leaves me with two assumptions: