Google 抓取工具找到 robots.txt,但无法下载

发布于 2024-09-15 02:47:51 字数 589 浏览 7 评论 0原文

谁能告诉我这个 robots.txt 有什么问题吗?

http://bizup.cloudapp.net/robots.txt

以下是我收到的错误在谷歌网站管理员工具中:

Sitemap errors and warnings
Line    Status  Details
Errors  -   
Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt file at the root of
your site but were unable to download it. Please ensure that it is accessible or remove
it completely.

实际上上面的链接是一个动作机器人的路线映射。该操作从存储中获取文件并以文本/纯文本形式返回内容。谷歌表示他们无法下载该文件。是因为这个吗?

Can anyone tell me what's wrong with this robots.txt?

http://bizup.cloudapp.net/robots.txt

The following is the error I get in Google Webmaster Tools:

Sitemap errors and warnings
Line    Status  Details
Errors  -   
Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt file at the root of
your site but were unable to download it. Please ensure that it is accessible or remove
it completely.

Actually the link above is the mapping of a route that goes an action Robots. That action gets the file from the storage and returns the content as text/plain. Google says that they can't download the file. Is it because of that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

裸钻 2024-09-22 02:47:51

看起来它正在读取 robots.txt,但您的 robots.txt 然后声称 http://bizup。 cloudapp.net/robots.txt 也是 XML 站点地图的 URL,实际上是 http ://bizup.cloudapp.net/sitemap.xml。该错误似乎来自 Google 尝试将 robots.txt 解析为 XML 站点地图。您需要将 robots.txt 更改为

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/sitemap.xml

编辑

它实际上比这更深入一些,Googlebot 根本无法下载您网站上的任何页面。以下是 Googlebot 请求 robots.txt 或主页时返回的异常:

此应用程序不支持 Cookieless 表单身份验证。

异常详细信息:System.Web.HttpException:无 Cookie 表单身份验证
此应用程序不支持。

[HttpException (0x80004005): Cookieless Forms Authentication is not supported for this application.]
AzureBright.MvcApplication.FormsAuthentication_OnAuthenticate(Object sender, FormsAuthenticationEventArgs args) in C:\Projectos\AzureBrightWebRole\Global.asax.cs:129
System.Web.Security.FormsAuthenticationModule.OnAuthenticate(FormsAuthenticationEventArgs e) +11336832
System.Web.Security.FormsAuthenticationModule.OnEnter(Object source, EventArgs eventArgs) +88
System.Web.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +80
System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +266

FormsAuthentication 尝试使用无 cookie 模式,因为它认识到 Googlebot 不支持 cookie,但 FormsAuthentication_OnAuthenticate 方法中的某些内容会抛出异常,因为它不想接受无 cookie 身份验证。

我认为最简单的方法是更改​​ web.config 中的以下内容,这会阻止 FormsAuthentication 尝试使用无 cookie 模式...

<authentication mode="Forms"> 
    <forms cookieless="UseCookies" ...>
    ...

It looks like it's reading robots.txt OK, but your robots.txt then claims that http://bizup.cloudapp.net/robots.txt is also the URL of your XML sitemap, when it's really http://bizup.cloudapp.net/sitemap.xml. The error seems to come from Google trying to parse robots.txt as an XML sitemap. You need to change your robots.txt to

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/sitemap.xml

EDIT

It actually goes a bit deeper than that, and Googlebot can't download any pages at all on your site. Here's the exception being returned when Googlebot requests either robots.txt or the homepage:

Cookieless Forms Authentication is not supported for this application.

Exception Details: System.Web.HttpException: Cookieless Forms Authentication
is not supported for this application.

[HttpException (0x80004005): Cookieless Forms Authentication is not supported for this application.]
AzureBright.MvcApplication.FormsAuthentication_OnAuthenticate(Object sender, FormsAuthenticationEventArgs args) in C:\Projectos\AzureBrightWebRole\Global.asax.cs:129
System.Web.Security.FormsAuthenticationModule.OnAuthenticate(FormsAuthenticationEventArgs e) +11336832
System.Web.Security.FormsAuthenticationModule.OnEnter(Object source, EventArgs eventArgs) +88
System.Web.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +80
System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +266

FormsAuthentication is trying to use cookieless mode because it recognises that Googlebot doesn't support cookies, but something in your FormsAuthentication_OnAuthenticate method is then throwing an exception because it doesn't want to accept cookieless authentication.

I think that the easiest way around that is to change the following in web.config, which stops FormsAuthentication from ever trying to use cookieless mode...

<authentication mode="Forms"> 
    <forms cookieless="UseCookies" ...>
    ...
倾城花音 2024-09-22 02:47:51

生成 robots.txt 文件的脚本有问题。当 GoogleBot 访问该文件时,它收到 500 Internal Server Error。以下是标头检查的结果:

REQUESTING: http://bizup.cloudapp.net/robots.txt
GET /robots.txt HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: bizup.cloudapp.net
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

SERVER RESPONSE: 500 INTERNAL SERVER ERROR
Cache-Control: private
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/7.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Thu, 19 Aug 2010 16:52:09 GMT
Content-Length: 4228
Final Destination Page

您可以在此处测试标头 http://www.seoconsultants.com/tools/headers/#Report" rel="nofollow noreferrer">http://www.seoconsultants.com seoconsultants.com/tools/headers/#Report

There is something wrong with the script that is generating the robots.txt file. When GoogleBot is accessing the file it is getting 500 Internal Server Error. Here are the results of the header check:

REQUESTING: http://bizup.cloudapp.net/robots.txt
GET /robots.txt HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: bizup.cloudapp.net
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

SERVER RESPONSE: 500 INTERNAL SERVER ERROR
Cache-Control: private
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/7.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Thu, 19 Aug 2010 16:52:09 GMT
Content-Length: 4228
Final Destination Page

You can test the headers here http://www.seoconsultants.com/tools/headers/#Report

浮华 2024-09-22 02:47:51

我用一种简单的方法解决了这个问题:只需添加一个 robots.txt 文件(与我的 index.html 文件在同一目录中),以允许所有访问。我把它遗漏了,打算允许所有访问都以这种方式进行 - 但也许 Google 网站管理员工具随后找到了另一个由我的 ISP 控制的 robots.txt?

因此,至少对于某些 ISP 来说,即使您不想排除任何机器人,您也应该拥有一个 robots.txt 文件,只是为了防止这种可能的故障。

I fixed this problem in a simple way: just by adding a robot.txt file (in the same directory as my index.html file), to allow all access. I had left it out, intending to allow all access that way -- but maybe Google Webmaster Tools then located another robot.txt controlled by my ISP?

So it seems that for some ISPs at least, you should have a robot.txt file even if you don't want to exclude any bots, just to prevent this possible glitch.

嘿嘿嘿 2024-09-22 02:47:51

我获取您的 robots.txt 没有问题

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/robots.txt

,但是它不是执行递归 robots.txt 调用吗?

站点地图应该是一个 xml 文件,请参阅 Wikipedia

I have no problem to get your robots.txt

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/robots.txt

However isn't it performing a recursive robots.txt call?

A Sitemap is supposed to be a xml file, see Wikipedia

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文