(Apache) 将 inode 放入 ETag 的目的是什么?
网络上有很多文章详细说明了为什么您可能不想要使用 Apache 的默认 inode-mtime-size 格式作为 ETag。
但我还没有读过任何关于 Apache 包含 inode 的动机的文章。从表面上看,只有当需要能够区分同一资源的八位字节对八位字节传真时,它似乎才有用,但这肯定与 ETag 的目的背道而驰。
Apache 的作者并不以处理互联网标准的马虎而闻名,所以我觉得我一定错过了一些东西。谁能详细说明一下吗?
编辑:我在这里问这个问题而不是在 ServerFault.com 上,因为我正在实现一个 Web 服务器而不是管理一个。要详细了解为什么这是一个坏主意,请参阅此处或< a href="http://david.weekly.org/writings/etags.php3" rel="noreferrer">此处。所有此类文章都推荐同一件事:从 etag 中删除 inode。问题是,他们在那里有什么好处吗?
There are plenty of articles on the web detailing why you might not want to use Apache's default inode-mtime-size format for ETags.
But I have yet to read anything on what might have motivated the inclusion of inode for Apache in the first place. On the face of it, it only seems useful if one needs to be able to differentiate between octet-for-octet facsimiles of the same resource, but this is surely counter to the very purpose of ETags.
Apache's authors are not known for their sloppy handing of internet standards, so I feel I must be missing something. Can anyone elaborate?
EDIT: I ask this here rather than on ServerFault.com because I'm implementing a web server rather than administering one. To read more about why it's a bad idea, see e.g. here or here. All such articles recommend the same thing: remove inodes from your etags. The question is, is there any advantage whatsoever to them being there?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
每当有一丝怀疑时,人们似乎很容易通过对常见情况的错误猜测,或者默认情况下更喜欢正确性而不是性能来做到这一点。
请允许我编一个关于它可能如何进行的故事:
他们很早就决定,出于性能原因,对内容进行哈希/校验和不是一个好主意。 “谁知道文件有多大?我们无法一直重新计算这些......”所以他们决定大小和日期让你非常接近。
“但是等等,”A 说,“没有什么能保证你不会发生文件大小冲突。事实上,在某些情况下,例如固件二进制文件,文件大小始终相同,并且完全有可能有几个文件大小相同。”同时从开发机器上传,因此这些不足以区分不同的内容。”
B:“嗯,说得好。我们需要一些与文件内容本质上相关的东西。结合修改时间,可以确定地告诉您内容是否相同。”
A:“inode 怎么样?现在,即使他们重命名文件(例如,可能将“推荐”更改为不同的文件),默认的 etag 也能正常工作!”
B:“我不知道,inode 好像有点危险。”
A:“嗯,那什么更好呢?”
B:“是的,好问题。我想我想不出具体有什么问题,我只是有一种不好的预感。”
A:“但至少它保证你会下载一个新的,如果它发生了变化。最糟糕的情况是你下载的次数超过了你需要的次数,任何知道自己不必担心的人都可以转向关掉它。”
B:“是的,这是有道理的。对于大多数情况来说这可能没问题,而且看起来比简单的替代方案更好。”
免责声明:我对 Apache 实现者的想法一无所知。这一切都只是手动猜测,并试图编造一个看似合理的故事。但我确实经常看到这种事情发生。
您永远不知道您没有想到的是什么(在这种情况下,为相同文件提供服务的冗余负载平衡服务器比不必担心大小+时间冲突更典型)。负载均衡器不是 apache 的一部分,这使得这种疏忽变得更容易。
另外,这里的失败模式是你没有完全有效地使用缓存(不是你得到了错误的数据),这可以说是更好的,但很烦人。这表明,即使他们确实想到了这一点,他们也可以合理地假设有足够兴趣设置负载均衡器的人也可以调整其配置细节。
PS:这与标准无关。没有任何内容指定您应该如何计算 etag,只是它应该足以判断内容是否已更改(很有可能)。
It seems like the kind of thing one could easily do by a wrong guess for what's the common case, or by preferring correctness over performance, by default, whenever there's a shred of doubt.
Allow me to make up a story about how it might have gone:
They decide early that a hash/checksum on the contents is a bad idea for performance reasons. "Who knows how big the file might be? We can't recalculate those all the time..." So they decide size and date get you pretty close.
"But wait," person A says, "nothing guarantees you don't have a file size collision. In fact, there are cases, such as firmware binaries, when the file size is always the same, and it's entirely possible that several are uploaded from a dev machine at the same time, so these aren't enough to distinguish between different contents."
Person B: "Hmm, good point. We need something that's intrinsically tied to the contents of the file. Something that, coupled with the modified time, can tell you for certain whether it's the same contents."
Person A: "What about the inode? Now, even if they rename the files (maybe they change "recommended" to a different file, for example), the default etag will work fine!"
Person B: "I dunno, inode seems a bit dangerous."
Person A: "Well, what would be better?"
Person B: "Yeah, good question. I guess I can't think what specifically is wrong with it, I just have a general bad feeling about it."
Person A: "But at least it guarantees you'll download a new one if it's changed. The worst that happens is you download more often than you need to, and anybody who knows they don't have to worry about it can just turn it off."
Person B: "Yeah, that makes sense. It's probably fine for most cases, and it seems better than the easy alternatives."
Disclaimer: I don't have any inside knowledge about what the Apache implementers could have been thinking. This is all just hand-wavy guessing, and trying to make up a plausible story. But I've certainly seen this kind of thing happen often enough.
You never know what it was that you didn't think of (in this case, that redundant load-balanced servers serving the same files was more typical than having to worry about size+time collisions). The load balancer isn't part of apache, which makes it easier to make such an oversight.
Plus, the failure mode here is that you didn't make perfectly efficient use of the cache (NOT that you got wrong data), which is arguably better, though annoying. Which suggests that even if they did think of it, they could reasonably assume somebody with enough interest to set up a load balancer would also be ok with tuning their configuration details.
PS: It's not about standards. Nothing specifies how you should calculate the etag, just that it should be enough to tell whether the contents have changed, with high probability.