当 HTTP 网页的类型为 html/text 时,如何判断它何时发生变化?
我正在尝试制定算法来判断网络上的非二进制文件是否已更改。我打算使用:
- 来自标头的 LastModified datetime,然后如果这些不存在,则从
- 标头回退到 ContentLength,
但我发现对于很多网站来说,HTML 页面的 LastModified 实际上只是使用当前的 DateTime,因此该方法不起作用(即会导致页面始终在变化的指示)我认为......?
那么什么是好的算法呢?怎么样?
IF response.ContentType.StartsWith("text/html") <== or should this just be "text"
THEN:
Check based on comparing text content before & after
ELSE:
IF LastModified dates are OK
Compare based on LastModified dates
ELSE
Compare based on ContentLength
谢谢
I'm trying to work out the algorithm to tell if non-binary files on the web have changed or not. I was going to go with:
- LastModified datetime from header, and then if these aren't present fallback to
- ContentLength from header
I'm finding however that for alot of websites the LastModified for the HTML pages are actually just using the current DateTime, hence the approach doesn't work (i.e. would lead to an indication that the page is always changing) I think...?
What would be a good algorithm then? How about?
IF response.ContentType.StartsWith("text/html") <== or should this just be "text"
THEN:
Check based on comparing text content before & after
ELSE:
IF LastModified dates are OK
Compare based on LastModified dates
ELSE
Compare based on ContentLength
thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
发送请求时,指定 If-Modified-Since http 标头。然后由服务器回复新的 html 或 304 - 内容未更改。
Sending the request, specify If-Modified-Since http header. Then it's up to the server to reply either with new html or with 304 - content not changed.
ETag 响应标头(如果存在)可以很好地指示这一点。使用带有 If-None-Match 的请求(或只是 HEAD 请求)来查看。
The ETag response header is a good indicator of this, if present. Use requests with If-None-Match (or just HEAD requests) to see.