网页监控问题
有许多不同的网站可以让您监视特定网页的任何更改,例如 watchthatpage.com 或 page2rss.com
我对这些网站的工作方式感兴趣,这意味着它们如何确定某些网页是否已更改已更新。他们是否只是复制页面中的所有文本,将其存储在内存中,然后将其与网站页面的内容进行比较? 或者他们可能会寻找一些特定的 html 元素并比较它们的值?
请帮我找到答案。
There are a number of different websites that let you monitor specifi web pages for any changes, such as watchthatpage.com or page2rss.com
I'm interested in the way how those sites are working, meaning how do they determine whether some web page is updated. Do they just copy all the text from the page, store it in memory and compare it later to the content of a site's page?
Or maybe they look for some specific html elements and compare theirs values?
Please help me to find the answer.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
工作原理:http://www.watchthatpage.com/information.jsp
How it works: http://www.watchthatpage.com/information.jsp
我怀疑他们存储了全部内容,并且每次检查时都会进行比较。如果不同,则发送警报,否则不发送警报。
I suspect that they store the entire contents, and every time they check, they compare. If different, send alert, otherwise don't.
有两种方法可以在我的脑海中完成此任务。
第一个是提取 HTML 并执行简单的 string.compare。
第二种方法是执行 HEAD 请求,请参阅此处
There's two ways this can be done just off the top of my head.
The first is to pull the HTML and do a simple string.compare.
The second way, would be to do a HEAD request See, section 9.4 here