按主题从 Twitter 用户构建网络图
我正在尝试构建一个提及特定主题的 Twitter 用户的社交网络图。我的策略大致如下:
- 在 twitter 上查询某个主题。收集出现的前 100 条推文并将这些用户添加到图表中。
- 对于每个用户:
- 检索朋友和关注者。
- 向每个朋友/关注者查询该主题。如果他们得出结果(意味着他们已经讨论了该主题),请将其添加到图表中。
- 对于添加到图表中的每个用户,返回到步骤 2,直到达到所需的搜索深度。
我的问题有两个方面。首先,这种方法很快超出了我的搜索 API 速率限制。即使搜索深度为 2,我也很可能会找到拥有 100 多个朋友/关注者的人,并且在达到速率限制之前我无法查询所有这些人。
其次,这一切都需要相当长的时间。 Twitter API 并不快。假设我不受速率限制,我可以异步提交请求,但我不禁想知道是否有更有效的方法。
我尝试将请求聚合到每个搜索深度的一个查询中: topic AND from:name1 OR from:name2 .... OR from:namei
这基本上爆炸了。我从 Twitter API 收到连接重置错误。如果我将查询复制到 Twitter 网页中,它只会停留一段时间,然后显示“加载推文似乎需要一段时间”。
我还通过电子邮件发送了 [email protected] 寻求建议/增加访问权限,但是到目前为止还没有回应。
如果有人对如何通过 Twitter API 收集此类信息有任何建议,我将非常感激。我目前正在使用 twitter4j 和 java。
I'm trying to construct a social network graph of twitter users who have mentioned a particular topic. My strategy to do this goes roughly like this:
- Query twitter for a topic. Collect the first 100 tweets that come up and add those users to the graph.
- For each user:
- Retrieve friends and followers.
- Query each friend/follower for the topic. If they turn up a result (meaning they've discussed the topic), add them to the graph.
- For each user that was added to the graph, return to step 2 until the desired search depth is reached.
My problem is two-fold. First of all, this approach quickly exceeds my search API rate limit. Even with a search depth of 2, it's quite likely that I'll find people with 100+ friends/followers and I am unable to query them all before hitting the rate limit.
Secondly, this all takes quite awhile. Twitter API is not fast. In the hypothetical event that I was not rate limited, I could submit the requests asynchronously, but I can't help wondering if there is a more efficient way.
I've tried aggregating the requests into one query per search depth:
topic AND from:name1 OR from:name2 .... OR from:namei
This basically explodes. I get a connection reset error from the twitter API. If I copy the query into the twitter web page, it just sits for awhile and then says "loading tweets seems to be taking awhile."
I also emailed [email protected] to ask for suggestions / access increase, but no response so far.
If anyone has any suggestions on how to go about gathering this type of information through the twitter API, I would very much appreciate it. I am currently using twitter4j and java.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您是否尝试过仅对主题使用过滤流,并使用提及和转发构建图表?这是相当间接的,并且仍然会很慢,但不会达到任何速率限制。
请参阅 http://truthy.indiana.edu/ 和 http://cnets.indiana.edu/groups/nan/truthy
Have you tried just using a filtered stream for a topic, and building the graph using mentions and retweets? This is quite indirect, and will still be slow, but won't hit any rate limits.
See http://truthy.indiana.edu/ and http://cnets.indiana.edu/groups/nan/truthy