如果我使用 HTTP 代码 418(又名“我是茶壶”)响应 robots.txt 请求,这会让搜索引擎不喜欢我吗?
我有一个非常简单的 Web 应用程序,它在 HTML5 的 Canvas 中运行,没有任何需要由搜索引擎索引的公共文件(除了包含对所有必要资源的调用的首页 HTML 文件之外)。因此,我实际上并不需要 robots.txt
文件,因为他们只会看到公共文件,仅此而已。
现在,开个玩笑,每次网络爬虫请求 robots.txt
时,我都想返回一个 HTTP-418 AKA“我是茶壶”响应。然而,如果这最终会让我在搜索结果中的位置变得糟糕,那么这对我来说就不是一个非常值得的笑话。
有谁知道不同的网络爬虫将如何响应非标准(尽管在这种情况下它在技术上是标准的)HTTP 代码?
另外,更严重的是,是否有任何理由需要一个 robots.txt
文件来表示“一切都可索引!”而不是只是没有文件?
I have a very simple webapp that runs within HTML5's Canvas that doesn't have any public files that need to be indexed by search engines (beyond the front-page HTML file that includes calls to all the necessary resources). As such, I don't really need robots.txt
file, since they'll just see the public files and that's it.
Now, as a joke, I'd like to return an HTTP-418 AKA "I'm a tea pot" response every time a web-crawler asks for robots.txt
. However, if this will end up screwing me over in terms of my location in search results, then this is not a joke that would be very worthwhile for me.
Does anybody know anything about how different web-crawlers will respond to non-standard (though in this case it technically is standard) HTTP codes?
Also, on a more serious note, is there any reason to have a robots.txt
file that says "everything is indexable!" instead of just not having a file?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
拥有空白的 robots.txt 文件还会告诉抓取工具您希望将所有内容编入索引。 robots.txt 有一个
allow
指令,但它是非标准的,不应依赖。这样做很好,因为每当搜索引擎尝试从您的网站请求不存在的 robots.txt 时,它可以防止 404 错误堆积在您的访问日志中。发送非标准 HTTP 代码不是一个好主意,因为您完全不知道搜索引擎将如何响应它。如果他们不接受,他们可能会使用 404 标头作为后备,这显然不是您想要发生的情况。基本上,这是一个不好开玩笑的地方。
Having a blank robots.txt file will also tell crawlers that you want all of your content indexed. There is an
allow
directive for robots.txt but it is non-standard and should not be relied upon. This is good to do because it keeps 404 errors from piling up in your access logs whenever a search engine tries to request a non-existent robots.txt from your site.Sending out non-standard HTTP codes is not a good idea as you have absolutely no idea how search engines will respond to it. If they don't accept it they may use a 404 header as a fallback and that's obviously not what you want to happen. Basically, this is a bad place to make a joke.