使用 Twitter4J 和 Twitter 流 API
在这个用例中,我需要监视 Twitter 流中是否有带有某些主题标签的推文,然后将这些推文拉出来并存储它们。我正在使用 Twitter4J 和 Twitter Streaming API。用于监控的主题标签经常更改,因此我想每 10 分钟左右刷新一次过滤器。当我刷新时,我只是从数据层中提取所有新的哈希标签并将它们传递给过滤器查询。我的两个问题:
每 10 分钟停止连接并刷新(就 Twitter 速率限制等而言)有什么问题
有什么可以防止我丢失在短暂刷新暂停期间发布的推文吗?
提前致谢。
In this use-case I need to monitor Twitters stream for tweets with certain hash-tags and then pull those tweets out and store them. I am using Twitter4J for this and Twitters Streaming API. The hash-tags to monitor change frequently so I would like to refresh the filter every 10 minutes or so. When I refresh I am simply pulling all the new hash-tags from the data layer and passing them to the filter query. My two questions:
Is there anything wrong with stopping the connection every 10 minutes and refreshing (in terms of Twitters rate limits etc)
Is there anything to prevent me losing tweets that are made during the short refresh pause?
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
重新连接的频率不应超过每十分钟一次,否则可能会受到速率限制。您可以在断开旧连接之前建立新连接,这有助于避免数据丢失。请注意,您一次可能只有一个未完成的连接。
You should not reconnect any more often than once every ten minutes, or you may be rate limited. You can form your new connection before dropping your old connection, which should help avoid data loss. Note that you may only have one outstanding connection at a time.