缓存过期控制与上次修改
在 Apache 的 mod_expires
模块中,有一个 Expires
指令,它有两个基本时间段:访问和修改。
ExpiresByType text/html "access plus 30 days"
可以理解,这意味着缓存将在 30 天后请求新鲜内容。
然而,
ExpiresByType text/html "modification plus 2 hours"
没有直观意义。
除非向服务器发出请求,否则浏览器缓存如何知道文件已被修改? 如果是调用服务器,缓存这个指令有什么用呢? 在我看来,我不理解缓存的一些关键部分。 请赐教。
In Apache's mod_expires
module, there is the Expires
directive with two base time periods, access, and modification.
ExpiresByType text/html "access plus 30 days"
understandably means that the cache will request for fresh content after 30 days.
However,
ExpiresByType text/html "modification plus 2 hours"
doesn't make intuitive sense.
How does the browser cache know that the file has been modified unless it makes a request to the server? And if it is making a call to the server, what is the use of caching this directive? It seems to me that I am not understanding some crucial part of caching. Please enlighten me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
以“修改”为基础的
Expires*
指令指的是服务器上文件的修改时间。 因此,如果您设置“修改加 2 小时”,则任何在文件修改(在服务器上)后 2 小时内请求内容的浏览器都会将该内容缓存到文件修改时间后 2 小时。 浏览器知道该时间是什么时候,因为服务器发送了带有正确过期时间的Expires
标头。让我用一个例子来解释:假设您的 Apache 配置包含该行
,并且您在服务器上有一个文件
index.html
,ExpiresDefault
指令应用到该文件。 假设您在 GMT 9:53 上传index.html
版本,覆盖之前存在的index.html
(如果有)。 所以现在index.html
的修改时间是9:53 GMT。 如果您在服务器上运行ls -l
(或在 Windows 上运行dir
),您将在清单中看到它:现在,对于每个请求,Apache 都会发送
Last-Modified
标头,包含文件的上次修改时间。 由于您有ExpiresDefault
指令,它还会发送Expires
标头,其时间等于文件的修改时间 (9:53) 加两个小时。 因此,这是浏览器看到的部分内容:如果浏览器发出此请求的时间是在 GMT 11:53 之前,则浏览器将缓存该页面,因为该页面尚未过期。 因此,如果用户首先在 11:00 GMT 访问该页面,然后在 11:30 GMT 再次访问同一页面,浏览器将看到其缓存版本仍然有效,并且不会(或者更确切地说,允许不) 发出新的 HTTP 请求。
如果用户在 12:00 GMT 第三次访问该页面,浏览器会发现其缓存版本现已过期(11:53 之后),因此它会尝试验证该页面,并向服务器发送带有 If 的请求-修改-Since 标头。 将返回没有正文的 304(未修改)响应,因为页面的日期自首次提供以来尚未更改。 由于过期日期已过——该页面已“过时”——随后每次访问该页面时都会发出验证请求,直到验证失败。
现在,我们假设您在 11:57 上传了页面的新版本。 在这种情况下,浏览器在 12:00 尝试验证旧版本页面失败,并在响应中收到新页面和这两个新标头:(
文件的最后修改时间变为 11:57上传新版本后,Apache 将到期时间计算为 11:57 + 2:00 = 13:57 GMT。)
现在直到 13:57 才需要验证(使用较新的日期)。
(当然请注意,许多其他内容是与上面列出的两个标头一起发送的,为了简单起见,我只是删除了所有其余内容)
An
Expires*
directive with "modification" as its base refers to the modification time of the file on the server. So if you set, say, "modification plus 2 hours", any browser that requests content within 2 hours after the file is modified (on the server) will cache that content until 2 hours after the file's modification time. And the browser knows when that time is because the server sends anExpires
header with the proper expiration time.Let me explain with an example: say your Apache configuration includes the line
and you have a file
index.html
, which theExpiresDefault
directive applies to, on the server. Suppose you upload a version ofindex.html
at 9:53 GMT, overwriting the previous existingindex.html
(if there was one). So now the modification time ofindex.html
is 9:53 GMT. If you were runningls -l
on the server (ordir
on Windows), you would see it in the listing:Now, with every request, Apache sends the
Last-Modified
header with the last modification time of the file. Since you have thatExpiresDefault
directive, it will also send theExpires
header with a time equal to the modification time of the file (9:53) plus two hours. So here is part of what the browser sees:If the time at which the browser makes this request is before 11:53 GMT, the browser will cache the page, because it has not yet expired. So if the user first visits the page at 11:00 GMT, and then goes to the same page again at 11:30 GMT, the browser will see that its cached version is still valid and will not (or rather, is allowed not to) make a new HTTP request.
If the user goes to the page a third time at 12:00 GMT, the browser sees that its cached version has now expired (it's after 11:53) so it attempts to validate the page, sending a request to the server with a If-Modified-Since header. A 304 (not modified) response with no body will be returned since the page's date has not been altered since it was first served. Since the expiry date has passed -- the page is 'stale' -- a validation request will be made every subsequent time the page is visited until validation fails.
Now, let's pretend instead that you uploaded a new version of the page at 11:57. In this case, the browser's attempt to validate the old version of the page at 12:00 fails and it receives in the response, along with the new page, these two new headers:
(The last modification time of the file becomes 11:57 upon upload of the new version, and Apache calculates the expiration time as 11:57 + 2:00 = 13:57 GMT.)
Validation (using the more recent date) will not be required now until 13:57.
(Note of course that many other things are sent along with the two headers I listed above, I just trimmed out all the rest for simplicity)
服务器发送一个标头,例如:“
Last-Modified: Wed, 18 Feb 2009 00:00:00 GMT
”。 缓存的行为基于此标头或访问时间。假设内容预计每天都会刷新,那么您希望它在“修改加 24 小时”后过期。
如果您不知道内容何时刷新,那么最好根据访问时间来确定。
The server sends a header such as: "
Last-Modified: Wed, 18 Feb 2009 00:00:00 GMT
". The cache behaves based on either this header or the access time.Say if the content is expected to be refreshed every day, then you want it to expire "modification plus 24 hours".
If you don't know when the content will be refreshed, then it's better to base it on the access time.
我的理解是,修改要求浏览器根据 Last-Modificatied HTTP 标头的值来确定缓存时间。 因此,修改时间加上 2 小时就是上次修改时间 + 2 小时。
My understanding is that modification asks the browser to base the cache time based on the Last-Modificatied HTTP header's value. So, modification plus 2 hours would be the Last-Modificatied time + 2 hours.
首先感谢David Z上面的详细解释。 在回答 Bushman 的问题(即如果服务器仍然需要发出请求,为什么调用缓存有意义),答案是时间保存在服务器返回的内容中。 如果缓存指令指示文件的内容仍然是新鲜的,则不会返回内容,而是返回 304 代码和空响应正文。 这就是节省时间的地方。
比我给出的更好的解释是这里,来自 https://devcenter.heroku.com/articles/increasing-application-performance-with-http-cache-headers:
First of all, thanks to David Z for the detailed explanation above. In answer to bushman's question about why does it make sense to invoke caching if the server is still required to make a request, the answer is that the time is saved in what is returned by the server. If the cache directives indicate that a file's content is still fresh, instead of returning content, a 304 code is returned with an empty response body. That is where the time is saved.
A better explanation than I've given is here, from https://devcenter.heroku.com/articles/increasing-application-performance-with-http-cache-headers :