什么是 monodoc.ashx 以及为什么 googlebot 需要它?
我收到了大量的请求。它们都以
/1.1/handlers/monodoc.ashx?link=
.NET 类的形式开始。这些是什么以及为什么 googlebot 要求它们?
我需要将其关闭,以便我的访问和错误日志不被污染。
I am getting TONS of request. They all start with
/1.1/handlers/monodoc.ashx?link=
then follows what look like .NET classes. What are these and why is googlebot requesting them?
I need to turn it off so my access and error log isnt polluted.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Googlebot 将请求它所知道的任何网址,其中包括您可能没有自己生成的网址。
例如,如果有一个论坛使用该 URI 链接到您的网站,Googlebot 将尝试抓取它以查看是否有任何值得索引的信息。
根据提供的 IP,我验证了它确实是 Googlebot,因为反向 DNS 查找解析为“crawl-66-249-68-184.googlebot.com”,正向 DNS 查找解析为“crawl-66-249-68-184” .googlebot.com' 解析回提供的 IP 地址。
如果该页面不应该存在,您可以做的最好的事情是使用 404 或 410 响应进行响应。如果您知道那里曾经有哪些内容,您应该将其 301 重定向到您网站上的相关页面,以防其他人链接到这些页面...您不仅想保留这些链接的链接信用,而且对于点击该链接的用户来说,这也是更好的用户体验。如果没有 301 重定向用户的相关位置,您可以将他们重定向到您的主页,但要知道,从 SEO 角度来看,链接价值将会衰减,因为链接的相关性可能与您主页的内容。
请务必确保您不会使用 500 或 503 响应代码进行响应。如果您有大量 5xx 类型的响应,Googlebot 会认为它对您的网站的打击太大,并会限制其抓取。
最后,即使您发送 301、404 或发送 410 响应...也希望看到 Googlebot 在一段时间内(例如,甚至几年后)访问这些 URL。我的一些网站每隔几周就会收到大量针对早已失效的旧 URI 的 Googlebot 流量。那里有一些旧的硬壳网址,Googlebot 会时不时地遇到它们,然后尝试重新抓取它们。他们甚至保留了一个历史列表,当他们认为有额外的带宽可以分配给您的网站时,他们会尝试抓取该列表。
TL;DR:别担心。 Googlebot 会毫无理由地访问这些网址。只需发送最佳用户体验的响应,就可以了。
Googlebot will request any URL that it knows of, which includes URLs that you may not have generated yourself.
For instance, if there's a forum out there that links to your site with that URI, Googlebot will attempt to crawl it to see if there's any information worth indexing.
Based on IP provided, I verified that it was indeed Googlebot since the reverse DNS lookup resolves to 'crawl-66-249-68-184.googlebot.com' and the forward DNS lookup for 'crawl-66-249-68-184.googlebot.com' resolves back to the IP address provided.
The best thing you can do it respond with a 404 or 410 response if that page shouldn't exist. If you have an idea of what content used to be there, you should 301 redirect it to a relevant page on your site just in case other people had linked to those pages ... you not only want to retain the link credit for those links, but also it's just a better user experience for users who have followed that link. If there isn't a relevant place to 301 redirect the users to, you can redirect them to your homepage, but just know that from an SEO perspective, the link value will decay since the relevancy of the links probably won't match exactly to the content of your homepage.
Definitely make sure that you're not responding with a 500 or 503 response code. If you have a large number of 5xx type of responses, Googlebot will think that it's hitting your site too hard and will throttle back their crawl.
Lastly, even if you 301, 404, or send a 410 response ... expect to see Googlebot to hit these URLs for sometime (e.g. even years from now). I've got sites that receive a burst of Googlebot traffic for long dead legacy URIs every few weeks. There are some old crusty URLs out there, and Googlebot will run across them from time to time and then attempt to recrawl them. They even keep a historic list which they'll attempt to crawl when they feel they have additional bandwidth to allocate to your site.
TL;DR: Don't sweat it. Googlebot will hit these URLs for no good reason. Just send the response that would be the best User Experience, and you'll be fine.