如果我使用 HTTP 代码 418(又名“我是茶壶”)响应 robots.txt 请求,这会让搜索引擎不喜欢我吗?

发布于 2024-12-16 11:42:29 字数 412 浏览 1 评论 0原文

我有一个非常简单的 Web 应用程序,它在 HTML5 的 Canvas 中运行,没有任何需要由搜索引擎索引的公共文件(除了包含对所有必要资源的调用的首页 HTML 文件之外)。因此,我实际上并不需要 robots.txt 文件,因为他们只会看到公共文件,仅此而已。

现在,开个玩笑,每次网络爬虫请求 robots.txt 时,我都想返回一个 HTTP-418 AKA“我是茶壶”响应。然而,如果这最终会让我在搜索结果中的位置变得糟糕,那么这对我来说就不是一个非常值得的笑话。

有谁知道不同的网络爬虫将如何响应非标准(尽管在这种情况下它在技术上是标准的)HTTP 代码?

另外,更严重的是,是否有任何理由需要一个 robots.txt 文件来表示“一切都可索引!”而不是只是没有文件?

I have a very simple webapp that runs within HTML5's Canvas that doesn't have any public files that need to be indexed by search engines (beyond the front-page HTML file that includes calls to all the necessary resources). As such, I don't really need robots.txt file, since they'll just see the public files and that's it.

Now, as a joke, I'd like to return an HTTP-418 AKA "I'm a tea pot" response every time a web-crawler asks for robots.txt. However, if this will end up screwing me over in terms of my location in search results, then this is not a joke that would be very worthwhile for me.

Does anybody know anything about how different web-crawlers will respond to non-standard (though in this case it technically is standard) HTTP codes?

Also, on a more serious note, is there any reason to have a robots.txt file that says "everything is indexable!" instead of just not having a file?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

野稚 2024-12-23 11:42:29
  1. 拥有空白的 robots.txt 文件还会告诉抓取工具您希望将所有内容编入索引。 robots.txt 有一个 allow 指令,但它是非标准的,不应依赖。这样做很好,因为每当搜索引擎尝试从您的网站请求不存在的 robots.txt 时,它可以防止 404 错误堆积在您的访问日志中。

  2. 发送非标准 HTTP 代码不是一个好主意,因为您完全不知道搜索引擎将如何响应它。如果他们不接受,他们可能会使用 404 标头作为后备,这显然不是您想要发生的情况。基本上,这是一个不好开玩笑的地方。

  1. Having a blank robots.txt file will also tell crawlers that you want all of your content indexed. There is an allow directive for robots.txt but it is non-standard and should not be relied upon. This is good to do because it keeps 404 errors from piling up in your access logs whenever a search engine tries to request a non-existent robots.txt from your site.

  2. Sending out non-standard HTTP codes is not a good idea as you have absolutely no idea how search engines will respond to it. If they don't accept it they may use a 404 header as a fallback and that's obviously not what you want to happen. Basically, this is a bad place to make a joke.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文