谷歌机器人,虚假链接

发布于 2024-11-17 16:17:28 字数 1648 浏览 2 评论 0原文

我对谷歌机器人有一个小问题,我有一个在 Windows Server 2009 上运行的服务器,名为 Workcube 的系统,它在 Coldfusion 上运行,有一个内置的错误报告器,因此我收到每条错误消息,特别是与谷歌机器人,试图进入一个不存在的错误链接!链接如下所示:

  1. http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=282&HIERARCHY=215.005&brand_id=hoyrrolmwdgldah
  2. http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=145&HIERARCHY=200.003&brand_id=hoyrrolmwdgldah
  3. http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=123&HIERARCHY=110.006&brand_id=xxblpflyevlitojg
  4. http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=1&HIERARCHY=100&brand_id=xxblpflyevlitojg

当然,其定义类似于brand_id=hoyrrolmwdgldah 或brand_id=xxblpflyevlitojg 是假的,我不知道可能是什么问题?!需要建议!谢谢大家的帮助! ;)

I have a little problem with google bot, I have a server working on windows server 2009, the system called Workcube and it works on coldfusion, there is an error reporter built-in, thus i recieve every message of error, especially it concerned with google bot, that trying to go to a false link, which doesn't exist! the links looks like this:

  1. http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=282&HIERARCHY=215.005&brand_id=hoyrrolmwdgldah
  2. http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=145&HIERARCHY=200.003&brand_id=hoyrrolmwdgldah
  3. http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=123&HIERARCHY=110.006&brand_id=xxblpflyevlitojg
  4. http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=1&HIERARCHY=100&brand_id=xxblpflyevlitojg

of course with definition like brand_id=hoyrrolmwdgldah or brand_id=xxblpflyevlitojg is false, i don't have any idea what can be the problem?! need advice! thank you all for help! ;)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

几味少女 2024-11-24 16:17:28

您可能需要使用 Google 网站管理员工具验证您的网站,该工具将提供发现错误的 URL。

您的日志也是有效的,但您需要验证确实是 Googlebot 访问您的网站,而不是有人欺骗其用户代理。

以下是执行此操作的说明:http://googlewebmastercentral.blogspot .com/2006/09/how-to-verify-googlebot.html

本质上,您需要执行反向 DNS 查找,然后在收到反向查找的主机后执行正向 DNS 查找。

一旦您确认它是真正的 Googlebot,您就可以开始进行故障排除。您会看到 Googlebot 不会请求它以前没有见过的网址,这意味着 Googlebot 不应该发出直接的对象引用请求。我怀疑这是一个带有 Googlebot 用户代理的流氓机器人,但如果不是,您可能需要检查一下您的网站,看看是否意外链接到了这些页面。

不幸的是,您发布了完整的 URL,因此即使您清理了网站,Googelbot 也会看到来自 Stack Overflow 的链接并继续抓取它们,因为它会在它们的抓取队列中。

我建议 301 将这些 URL 重定向到对您的用户有意义的地方。否则我会对这些页面进行 404 或 410 处理,以便 Google 知道从索引中删除这些页面。

此外,如果您不希望将这些页面编入索引,我建议您将路径添加到 robots.txt 文件中,以便 Googlebot 无法继续请求更多此类页面。

不幸的是,没有真正好的方法告诉 Googlebot 永远不再抓取这些网址。您始终可以进入 Google 网站管理员工具并请求从索引中删除网址,这可能会阻止 Googlebot 再次抓取它们,但这并不能保证一定会成功。

You might want to verify your site with Google Webmaster Tools which will provide URLs that it finds that error out.

Your logs are also valid, but you need to verify that it really is Googlebot hitting your site and not someone spoofing their User Agent.

Here are instructions to do just that: http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html

Essentially you need to do a reverse DNS lookup and then a forward DNS lookup after you receive the host from the reverse lookup.

Once you've verified it's the real Googlebot you can start troubleshooting. You see Googlebot won't request URLs that it hasn't naturally seen before, meaning Googlebot shouldn't be making direct object reference requests. I suspect it's a rogue bot with a User Agent of Googlebot, but if it's not you might want to look through your site to see if you're accidentally linking to those pages.

Unfortunately you posted the full URLs, so even if you clean up your site, Googelbot will see the links from Stack Overflow and continue to crawl them because it'll be in their crawl queue.

I'd suggest 301 redirecting these URLs to someplace that make sense to your users. Otherwise I would 404 or 410 these pages so Google know to remove these pages from their index.

In addition, if these are pages you don't want indexed, I would suggest adding the path to your robots.txt file so Googlebot can't continue to request more of these pages.

Unfortunately there's no real good way of telling Googlebot to never ever crawl these URLs again. You can always go into Google Webmaster Tools and request the URLs to be removed from their index which may stop Googlebot from crawling them again, but that doesn't guarantee it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文