Googlebot 导致 .NET System.Web.HttpException

发布于 2024-11-19 07:12:15 字数 565 浏览 4 评论 0原文

我有一个与经典 asp 混合的 ASP.NET 网站(我们正在努力转换为 .NET),我最近从 .NET 1.1 升级到 .NET 4.0,并切换到 IIS 7 中的集成管道。

由于这些更改,ELMAH 报告错误来自几乎没有详细信息的经典asp页面(状态代码404):

System.Web.HttpException (0x80004005)
   at System.Web.CachedPathData.ValidatePath(String physicalPath)
   at System.Web.HttpApplication.PipelineStepManager.ValidateHelper(HttpContext context)

但是当我自己请求该页面时,没有发生错误。 ELMAH 中显示的所有这些错误都是由 Googlebot 抓取工具(用户代理字符串)引起的。

.NET 为什么会发现经典 asp 页面的错误?这与集成管道有关吗?

有什么想法为什么错误只在 Google 抓取页面时发生,或者我如何获取更多详细信息以找到潜在的错误?

I have an ASP.NET website mixed with classic asp (we are working on a conversion to .NET) and I recently upgraded from .NET 1.1 to .NET 4.0 and switched to integrated pipeline in IIS 7.

Since these changes ELMAH is reporting errors from classic asp pages with practicaly no detail (and status code 404):

System.Web.HttpException (0x80004005)
   at System.Web.CachedPathData.ValidatePath(String physicalPath)
   at System.Web.HttpApplication.PipelineStepManager.ValidateHelper(HttpContext context)

But when I request the page myself, no error occurs. All these errors showing up in ELMAH are caused by the Googlebot crawler (user agent string).

How come .NET picks up errors for classic asp pages? Has this got to do with the integrated pipeline?

Any ideas why the error only happens when Google crawls the page or how I can get more details to find the underlying fault?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

风渺 2024-11-26 07:12:16

看起来 Google 抓取工具会遍历不再存在的链接。 IE 您网站上的某些文档可能引用了其他文档,但它们已被删除。

我看起来并不认真,所以你可能会考虑过滤掉这个例外。

It looks like Google crawlers goes througt a links that does no longer existing. IE there could be some documents on you site that refer some another documents, but they are deleted.

I does not look serious as for me, so you might consider to filter out that exception.

嘦怹 2024-11-26 07:12:16

这仅适用于您使用 Angular 的情况,但如果

<httpRuntime relaxedUrlToFileSystemMapping="false" /> (as mentioned in the previous answers)

您在图像或脚本标记上使用 src 而不是 ng-src,您就会看到这一点,即

<img src="{{SomeModelValue}}" />

应该是

<img ng-src="{{SomeModelValue}}" />

这也可能会影响您使用 href 而不是 ng- 的 A 标记链接。

This only applies if you are using Angular, but you'll see this if

<httpRuntime relaxedUrlToFileSystemMapping="false" /> (as mentioned in the previous answers)

and you use src instead of ng-src on an image or script tag, i.e

<img src="{{SomeModelValue}}" />

should be

<img ng-src="{{SomeModelValue}}" />

This could also affect A tags where you are using href instead of ng-href.

如歌彻婉言 2024-11-26 07:12:15

将其添加到您的 web.config 文件中:

<httpRuntime relaxedUrlToFileSystemMapping="true" />

禁用默认检查以确保请求的 URL 符合 Windows 路径规则。

要重现该问题,请将 %20(URL 转义空格)添加到 URL 末尾,例如 http://example.org/%20。当搜索爬虫遇到错误输入的带有空格的链接时,很常见这种问题,例如 example

HttpContext.Request.Url 属性似乎会修剪尾部空格,这就是为什么像 ELMAH 这样的日志记录工具不会揭示实际问题。

Add this to your web.config file:

<httpRuntime relaxedUrlToFileSystemMapping="true" />

This disables the default check to makes sure that requested URLs conform to Windows path rules.

To reproduce the problem, add %20 (URL-escaped space) to the end of the URL, e.g. http://example.org/%20. It's fairly common to see this problem from search crawlers when they encounter mis-typed links with spaces, e.g. <a href="http://example.org/ ">example</a>.

The HttpContext.Request.Url property seems to trim the trailing space, which is why logging tools like ELMAH don't reveal the actual problem.

琉璃梦幻 2024-11-26 07:12:15

当您从经典管道更改为集成管道时,您实际上将控制权移交给了 .NET,这意味着 .NET 将调用 ASP 解析器。这增加了在 .NET 托管代码中编码的自定义 HTTPModule 的能力,可以更改响应的输出,或者在 elmah 的情况下,为您提供日志记录详细信息。

我会查看日志,看看发生错误时 googlebot 使用的是什么用户代理发生并遵循与更改用户代理时完全相同的路径。

Mozilla Firefox 是实现此目的的最佳浏览器 用户代理切换器插件

When you changed from classic pipeline to integrated pipeline, you essentially turned control over to .NET, meaning .NET will call up the ASP Parser. This adds the ability for custom HTTPModules coded in .NET Managed code that can change the output of the response or in the case of elmah, give you logging details.

I would look at the log, see what user agent googlebot is using at the time when the error occurrs and follow the exact same path it did with your user agent changed.

Mozilla Firefox is the best browser for this with the User Agent Switcher addon

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文