robots.txt:用户代理:Googlebot不允许:/ Google仍在索引
看一下这个网站的robots.txt:
内容是:
User-Agent: Googlebot
Disallow: /
那应该告诉谷歌不要索引该网站,不是吗?
如果属实,为什么该网站会出现在谷歌搜索中?
Look at the robots.txt of this site:
The content is:
User-Agent: Googlebot
Disallow: /
That ought to tell google not to index the site, no?
If true, why does the site appear in google searches?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
除了必须等待之外,因为 Google 的索引更新需要一些时间,还要注意,如果有其他网站链接到您的网站,仅靠 robots.txt 不足以删除您的网站。
引用 Google 的支持页面 “从 Google 搜索结果中删除网页或网站”:
上述文档中还提到了一种可能的替代解决方案:
Besides having to wait, because Google's index updates take some time, also note that if you have other sites linking to your site, robots.txt alone won't be sufficient to remove your site.
Quoting Google's support page "Remove a page or site from Google's search results":
One possible alternative solution is also mentioned in above document:
我可以确认 Google 不尊重机器人排除文件。这是我的文件,是我在将此源放到网上之前创建的:
https://git.habd.as/robots .txt
以及文件的完整内容:
Google 仍然将其编入索引。
去年 3 月取消我的帐户后,我不再使用 Google,也从未将此网站添加到 Yandex 之外的网站管理员控制台,这让我有两个假设:
我还没有 grep 我的网站日志尚未,但我会的,我的假设是我会发现谷歌蜘蛛在那里行为不端。
I can confirm Google doesn't respect the Robots Exclusion File. Here's my file, which I created before putting this origin online:
https://git.habd.as/robots.txt
And the full contents of the file:
And Google still indexed it.
I don't use Google after cancelling my account last March and never had this site added to a webmaster console outside Yandex which leaves me with two assumptions:
I haven't grepped my logs yet but I will and my assumption is I'll find Google spiders in there misbehaving.
如果您刚刚添加了此内容,那么您必须等待 - 这不是即时的 - 直到 Googlebot 回来重新蜘蛛化该网站并看到 robots.txt,该网站仍将在他们的数据库中。
我怀疑它是否相关,但你可能想将你的“代理”更改为“代理” - 谷歌很可能对此不区分大小写,但严格遵循标准不会有什么坏处。
If you just added this, then you'll have to wait - it's not instantaenous - until Googlebot comes back to respider the site and sees the robots.txt, the site'll still be in their database.
I doubt it's relevant, but you might want to change your "Agent" to "agent" - Google's most likely not case sensitive for this, but can't hurt to follow the standard exactly.