网页更新时发出警报
我正在用 Java 创建一个应用程序来检查网页是否已更新。
但是,某些网页没有“上次修改”标题。
我什至尝试检查内容长度的变化,但这种方法并不可靠,因为有时内容长度的变化没有对网页进行任何修改,从而产生误报。
我真的需要一些帮助,因为我想不出一个万无一失的方法。
有什么想法吗???
I am creatin an app in Java that checks if a webpage has been updated.
However some webpages dont have a "last Modified" header.
I even tried checking for a change in content length but this method is not reliable as sometimes the content length changes without any modification in the webpage giving a false alarm.
I really need some help here as i am not able to think of a single foolproof method.
Any ideas???
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您像以下代码一样始终连接到网页,它会有所帮助:
网页更改后如何获得通知而不是刷新?
If you connect the whole time to the webpage like this code it can help:
How to get notified after a webpage changes instead of refreshing?
也许最可靠的选择是存储页面内容的哈希值。
Probably the most reliable option would be to store a hash of the page contet.
如果您说内容长度发生变化,那么您尝试检查的网页可能是动态生成的,并且本质上不是静态的。如果是这种情况,那么即使您检查“last-Modified”标头,在大多数情况下它也不会反映内容的更改。
我想唯一的解决方案是仅适用于特定页面的页面特定解决方案,您可以解析一个页面并查找该页面某些部分的内容更改,您可以通过上次修改的标题和其他一些页面来检查另一个页面必须使用内容长度进行检查,在我看来,没有办法以统一的模式对互联网上的所有页面进行检查。另一种选择是与开发页面的人员交谈,您正在检查一些标记,这将帮助您确定页面是否更改,但这当然取决于您的具体用例以及您正在使用它做什么。
If you are saying that content-length changes then probably the webpages your are trying to check are dynamically generated and or not whatsoever a static in nature. If that is the case then even if you check the 'last-Modified' header it won't reflect the changes in content in most cases anyway.
I guess the only solution would be a page specific solution working only for a specific page, one page you could parse and look for content changes in some parts of this page, another page you could check by last modified header and some other pages you would have to check using the content length, in my opinion there is no way to do it in a unified mode for all pages on the internet. Another option would be to talk with people developing the pages you are checking for some markers which will help you determine if the page changed or not but that of course depends on your specific use case and what you are doing with it.