Googlebot 导致 .NET System.Web.HttpException
我有一个与经典 asp 混合的 ASP.NET 网站(我们正在努力转换为 .NET),我最近从 .NET 1.1 升级到 .NET 4.0,并切换到 IIS 7 中的集成管道。
由于这些更改,ELMAH 报告错误来自几乎没有详细信息的经典asp页面(状态代码404):
System.Web.HttpException (0x80004005)
at System.Web.CachedPathData.ValidatePath(String physicalPath)
at System.Web.HttpApplication.PipelineStepManager.ValidateHelper(HttpContext context)
但是当我自己请求该页面时,没有发生错误。 ELMAH 中显示的所有这些错误都是由 Googlebot 抓取工具(用户代理字符串)引起的。
.NET 为什么会发现经典 asp 页面的错误?这与集成管道有关吗?
有什么想法为什么错误只在 Google 抓取页面时发生,或者我如何获取更多详细信息以找到潜在的错误?
I have an ASP.NET website mixed with classic asp (we are working on a conversion to .NET) and I recently upgraded from .NET 1.1 to .NET 4.0 and switched to integrated pipeline in IIS 7.
Since these changes ELMAH is reporting errors from classic asp pages with practicaly no detail (and status code 404):
System.Web.HttpException (0x80004005)
at System.Web.CachedPathData.ValidatePath(String physicalPath)
at System.Web.HttpApplication.PipelineStepManager.ValidateHelper(HttpContext context)
But when I request the page myself, no error occurs. All these errors showing up in ELMAH are caused by the Googlebot crawler (user agent string).
How come .NET picks up errors for classic asp pages? Has this got to do with the integrated pipeline?
Any ideas why the error only happens when Google crawls the page or how I can get more details to find the underlying fault?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
看起来 Google 抓取工具会遍历不再存在的链接。 IE 您网站上的某些文档可能引用了其他文档,但它们已被删除。
我看起来并不认真,所以你可能会考虑过滤掉这个例外。
It looks like Google crawlers goes througt a links that does no longer existing. IE there could be some documents on you site that refer some another documents, but they are deleted.
I does not look serious as for me, so you might consider to filter out that exception.
这仅适用于您使用 Angular 的情况,但如果
您在图像或脚本标记上使用 src 而不是 ng-src,您就会看到这一点,即
应该是
这也可能会影响您使用 href 而不是 ng- 的 A 标记链接。
This only applies if you are using Angular, but you'll see this if
and you use src instead of ng-src on an image or script tag, i.e
should be
This could also affect A tags where you are using href instead of ng-href.
将其添加到您的 web.config 文件中:
此 禁用默认检查以确保请求的 URL 符合 Windows 路径规则。
要重现该问题,请将
%20
(URL 转义空格)添加到 URL 末尾,例如http://example.org/%20
。当搜索爬虫遇到错误输入的带有空格的链接时,很常见这种问题,例如example
。HttpContext.Request.Url
属性似乎会修剪尾部空格,这就是为什么像 ELMAH 这样的日志记录工具不会揭示实际问题。Add this to your web.config file:
This disables the default check to makes sure that requested URLs conform to Windows path rules.
To reproduce the problem, add
%20
(URL-escaped space) to the end of the URL, e.g.http://example.org/%20
. It's fairly common to see this problem from search crawlers when they encounter mis-typed links with spaces, e.g.<a href="http://example.org/ ">example</a>
.The
HttpContext.Request.Url
property seems to trim the trailing space, which is why logging tools like ELMAH don't reveal the actual problem.当您从经典管道更改为集成管道时,您实际上将控制权移交给了 .NET,这意味着 .NET 将调用 ASP 解析器。这增加了在 .NET 托管代码中编码的自定义 HTTPModule 的能力,可以更改响应的输出,或者在 elmah 的情况下,为您提供日志记录详细信息。
我会查看日志,看看发生错误时 googlebot 使用的是什么用户代理发生并遵循与更改用户代理时完全相同的路径。
Mozilla Firefox 是实现此目的的最佳浏览器 用户代理切换器插件
When you changed from classic pipeline to integrated pipeline, you essentially turned control over to .NET, meaning .NET will call up the ASP Parser. This adds the ability for custom HTTPModules coded in .NET Managed code that can change the output of the response or in the case of elmah, give you logging details.
I would look at the log, see what user agent googlebot is using at the time when the error occurrs and follow the exact same path it did with your user agent changed.
Mozilla Firefox is the best browser for this with the User Agent Switcher addon