Twitter 数据挖掘:分离度
我可以使用哪些现成可用的算法来对 Twitter 进行数据挖掘,以找出 Twitter 上两个人之间的分离程度。
当社交图不断变化和更新时,它会如何变化。
然后,是否有任何我可以使用的 Twitter 社交图数据转储,而不是进行如此多的 API 调用来重新开始。
What ready available algorithms could I use to data mine twitter to find out the degrees of separation between 2 people on twitter.
How does it change when the social graph keeps changing and updating constantly.
And then, is there any dump of twitter social graph data which I could use rather than making so many API calls to start over.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
来自 Twitter API
什么是数据挖掘 Feed,我可以访问它吗?
数据挖掘源是我们的 /statuses/public_timeline REST API 方法的扩展版本。 它返回 600 个最近的公共状态,一次缓存一分钟。 您可以每分钟最多请求一次,以获得 Twitter 上公共状态的代表性样本。 我们向研究人员和爱好者免费提供此服务(并且没有服务质量保证)。 我们所要求的只是您提供您的研究或项目的简要描述以及您将请求源的 IP 地址; 只需填写这张表格。 请注意,数据挖掘源并非旨在提供 Twitter 上所有公共更新的连续流; 有关即将推出的“firehose”解决方案的更多信息,请参阅上文。
另请参阅:流 API 文档
From the Twitter API
What's the Data Mining Feed and can I have access to it?
The Data Mining Feed is an expanded version of our /statuses/public_timeline REST API method. It returns 600 recent public statuses, cached for a minute at a time. You can request it up to once per minute to get a representative sample of the public statuses on Twitter. We offer this for free (and with no quality of service guarantees) to researchers and hobbyists. All we ask is that you provide a brief description of your research or project and the IP address(es) you'll be requesting the feed from; just fill out this form. Note that the Data Mining Feed is not intended to provide a contiguous stream of all public updates on Twitter; please see above for more information on the forthcoming "firehose" solution.
and also see: Streaming API Documentation
有一家公司提供社交图谱转储,但它已被删除并且不再可用。 正如您已经意识到的那样 - 这有点困难,因为它一直在变化。
我建议您查看他们的social_graph api 方法,因为它们以最少的 API 调用提供最多的信息。
There was a company offering a dump of the social graph, but it was taken down and no longer available. As you already realized - it is kind of hard, as it is changing all the time.
I would recommend checking out their social_graph api methods as they give the most info with the least API calls.
可能还有其他方法可以做到这一点,但我刚刚花了过去 10 分钟考虑做类似的事情,并偶然发现了这个问题。
我会使用无向(和加权 - 因为我也想查看位置)图- 在py中使用JgraphT或类似的; JGraphT 基于 Java,但包含不同的预先编写的算法。
然后你可以使用一种称为 BellmanFord 的算法; 与 Dijkstras 不同,采用整数输入并使用整数输入在图形中搜索最短路径,并且仅使用整数输入。
http://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm
我最近在一个航班路由项目中使用它,迭代以找到具有最短“跳跃”(边缘)的最短路径。
There might be other ways of doing it but I've just spent the past 10 minutes looking at doing something similar and stumbled upon this Q.
I'd use an undirected (& weighted - as I want to look at location too) graph - use JgraphT or similar in py; JGraphT is java based but includes different prewritten algos.
You can then use an algorithm called BellmanFord; takes an integer input and searches the graph for the shortest path with the integer input, and only integer input, unlike Dijkstras.
http://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm
I used it recently in a project for flight routing, iterating up to find shortest path with shortest 'hops' (edges).