为什么 Google(或 Googlebot)会对返回 500 错误的页面建立索引?
Googlebot 偶尔会使用错误的查询字符串参数对我们的网站之一进行索引。我不确定它是如何获取此查询字符串参数的(似乎没有任何网站通过错误链接链接到我们,并且我们网站中没有任何内容插入错误值)。正如我们所期望的,坏参数会导致站点抛出 500 错误。
我的印象是 Google 不会索引返回 500 错误的页面,但事实证明确实如此。所以现在我有两个问题:
1)为什么 Googlebot 会随机插入错误的查询字符串值? (我并不真正关心这个问题的答案,但如果我们可以做一些事情来避免这种情况,它就会解决我们的问题。)
2)为什么 Google 会索引一个返回 500 错误的页面?
以下是 Googlebot 创建并已被 Google 编入索引的错误链接之一:
http://www.pbs.org/teacherline/catalog/browse/?sa=4&gb =baqhuxts&gb=20&gb=21&num=20&page=2&js=0&sa=1
错误的参数是 gb=baqhuxts。参数“gb”应为整数。如果您从查询字符串中删除该参数,您应该会显示一个漂亮的目录页面。
关于 nofollow 和 robots.txt 解决方案: [已编辑]
我现在意识到我是个白痴,并放置了一个元标记告诉搜索机器人索引页面。这是一件愚蠢的事。我正在删除那些。 W-(
如果您在 Google 上搜索 ' baqhuxts' 你会发现它已经用这个错误的参数索引了 10 个页面,但是每个页面都返回 500 错误。有谁知道为什么 Google 认为这些是有效的索引页面?
Googlebot has been occasionally indexing one of our sites with a bad query string parameter. I am not sure how it is getting this query string parameter (there don't appear to be any sites linking to us with bad links, and nothing in our site is inserting the bad value). The bad parameter causes the site to throw a 500 error, as we expect.
I was under the impression that Google would not index pages that return a 500 error, but it turns out that it is. So now I have two questions:
1) Why would Googlebot be inserting random bad query string values? (I don't really care about the answer to this question, but if we could do something to avoid that, it would solve our problem.)
2) Why would Google index a page that returns a 500 error?
Here is one of the erroneous links that the Googlebot created and that Google has indexed:
http://www.pbs.org/teacherline/catalog/browse/?sa=4&gb=baqhuxts&gb=20&gb=21&num=20&page=2&js=0&sa=1
The bad parameter is gb=baqhuxts. The parameter 'gb' is expected to be an integer. If you remove that parameter from the query string you should get a nice catalog page showing.
Regarding nofollow and robots.txt solutions: [ REDACTED ]
I realize now that I am a moron and put a meta tag telling search robots to index the page. That was a dumb thing to do. I'm removing those. W-(
If you search on Google for 'baqhuxts' you will find that it has indexed 10 pages with this bad parameter. But each of these pages returns a 500 error. Does anyone have insight about why Google believes these are valid pages to index?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这可能是因为你告诉谷歌通过在你的元标签中包含这个来索引它:
尝试删除它! :)
It's probably because you are telling Google to index it by having this in your meta-tags:
Try removing that! :)
不幸的是我只知道第一个问题的答案:
谷歌会抓取这样奇怪的页面,因为使用谷歌工具栏的人会访问不存在的页面,并且他们的浏览信息会传输到谷歌。这就是为什么您经常会发现已索引的页面没有被索引的业务,例如,未从任何地方链接到的 phpmyadmin 页面。
unfortunately i know only the answer to #1:
google will crawl weird pages like that because people with google toolbar go to pages that dont exist, and their browsing information is transmitted to google. This is why you will often find pages indexed that have no business being indexed, for example, phpmyadmin pages that arent linked to from anywhere.