如何在不轮询的情况下监视页面更改?
我目前有一个用 C++ 编写的 IRC 机器人,它监视用 php 编写的页面的更改,然后将这些更改输出到 IRC 通道。 然而,当前的方法相当低效,因为它只是每 10 秒不断轮询一次页面,并将其与上次看到的版本进行比较,以检查是否有任何更改。 在 IRC 机器人开始受到性能影响之前,我可以将页面检查间隔减少到大约 2-3 秒,但这并不理想。 通常我监视的页面可能会在 10 秒内发生多次更改,因此可能会错过更改,从页面获取数据的更好方法是什么?考虑到我同时控制用 PHP 编写的页面和 IRC 机器人,但它们位于不同的服务器上。
此页面的唯一目的是将数据传递给 IRC 机器人,因此如果这是更好的解决方案,可以将其完全重新实现为其他内容; IRC 机器人还监视此页面的多个版本以检查不同的内容。
I currently have an IRC bot written in C++ which monitors a page written in php for changes and then outputs these changes to the IRC channel.
However the current method is rather in-effective as it just constantly polls the page once every 10 seconds and compares it to the last seen version to check if anything has changed.
I can decrease the page check interval to about 2-3 seconds before the IRC bot starts to take a performance hit, however this isn't ideal.
Often the page I am monitoring can change multiple times within the 10 second period, so a change could be missed, what would be a better method to get the data from the page? considering I control both the page written in PHP, and the IRC bot, but they are on different servers.
The sole purpose of this page is to pass data to the IRC bot, so it could be completely re-implemented as something else if that would be a better solution; the IRC bot also monitors multiple versions of this page to check for different things.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果 PHP 生成的数据没有以某种方式推送到流(广播或提要)上,那么不幸的是,除了轮询页面之外,您没有任何其他选择。
您可以做的是使用广播推送来自 PHP 的数据,或者从机器人到 PHP 脚本建立持久连接,或者让 PHP 自己计算差异。
If the data generated by PHP isn't somehow pushed on a stream (broadcast or feed), you don't have any other choice than polling the page, unfortunately.
What you could do is push the data from PHP using broadcast, or make a persistent connection from the bot to the PHP script, or make the PHP calculate the differences itself.
PHP 脚本应向 IRB 机器人侦听的公共端口或路径发送一条消息,其中包含有关所发布的任何帖子的信息。这样,只有当消息到达时您才会收到通知。
关于做这些事情的一个注意事项是,要注意短时间内是否有大量帖子;如果并发性很重要,您将需要使用适当的 MQ 服务(如 0MQ/RabbitMQ/InsertMQFrameworkNameHere)来实现这一点,以确保消息按顺序到达并保证发送和接收。
The PHP script should send a message to a public port or path that your IRB bot listens on, containing information about any posts made. This way, you are notified only when a message arrives.
One note about doing these sorts of things, beware if there are a lot of posts within a short period; if concurrency is important, you'll want to implement this using a proper MQ service like 0MQ/RabbitMQ/InsertMQFrameworkNameHere to ensure the messages arrive in order and are guaranteed sending and receiving.
如果您需要监视每个更改,那么让您的 PHP 页面将数据“推送”到您的机器人,而不是让您的 IRC 机器人从页面“拉取”数据(通过轮询)。这可以通过任何网络套接字完成,甚至可以通过端口 80 从 PHP 页面向机器人发出 HTTP POST 请求。
If you need to monitor every change, then have your PHP page "push" data to your bot rather than your IRC bot "pull" data from the page (through polling). This can be done over any network socket, even something like a HTTP POST request from your PHP page to your bot over port 80.
Comet 是轮询的一个很好的替代方案。以下是示例(不过针对 JavaScript): http://www.zeitoun.net/articles/comet_and_php/开始。
A good alternative to polling is Comet. Here are examples (for JavaScript though): http://www.zeitoun.net/articles/comet_and_php/start.
我建议采用这种方法:
当您检索页面时,指定一个很长的超时,比如 10 分钟(请耐心等待一下);
如果有新页面,让服务器返回;否则就不要发送回复
如果没有页面,客户端将等待最多 10 分钟然后放弃(超时);但是,如果在此期间有新页面,您的服务器可以回复请求并将页面传递给客户端;
如果发生超时,您只需发送另一个具有相同长超时的请求。
希望我能解释清楚。唯一棘手的一点是,如果没有新数据要发回,您的网页 (PHP) 如何在请求到达时保持等待。
这可以很容易地完成,如下所示:
I would suggest this approach:
when you retrieve your page, specify a very long timeout, say 10 minutes (bear with me for a moment);
if you have a new page, let the server return it; otherwise just don't send a reply
if there is no page, the client will wait for up to 10 minutes before giving up (timing out); but, if during this time a new page is there, your server can reply to the request and pass the page to the client;
in case the timeout fires, you simply send another request with the same long timeout.
Hope I could explain it clearly. The only tricky point is how your web page (PHP) can hold the wait when a request arrives if there is no new data to send back.
This can be easily accomplished like this: