我想尝试制作一个简单的 Twitter 客户端,它可以了解我的品味并自动查找朋友和有趣的推文,为我提供相关信息。
首先,我需要获得大量随机 Twitter 消息,以便我可以在它们上测试一些机器学习算法。
为此我应该使用哪些 API 方法?我是否必须定期轮询才能获取消息,或者有没有办法让 Twitter 在消息发布时推送消息?
我也有兴趣了解任何类似的项目。
I'd like to try to make a simple twitter client that learns my tastes and automatically finds friends and interesting tweets to provide me with relevant information.
To get started, I would need to get a good stream of random twitter messages, so I can test a few machine learning algorithms on them.
What API methods should I use for this? Do I have to poll regularly to get messages, or is there a way to get twitter to push messages as they are published?
I'd also be interested in learning about any similar project.
发布评论
评论(4)
我使用 tweepy 访问 Twitter API 并收听 他们提供的公共流——这应该是所有推文的百分之一的样本。这是我自己使用的示例代码。您仍然可以使用基本的身份验证机制进行流式传输,尽管它们可能很快就会改变。相应地更改 USERNAME 和 PASSWORD 变量,并确保您遵循 Twitter 返回的错误代码(此示例代码可能不遵循 Twitter 在某些情况下所需的指数退避机制)。
我还设置了套接字模块的超时,我相信我对Python中的默认超时行为有一些问题,所以要小心。
I use tweepy to access Twitter API and listen to the public stream they provide -- which should be a one-percent-sample of all tweets. Here is my sample code that I use myself. You can still use the basic auth mechanism for streaming, though they may change that soon. Change the USERNAME and PASSWORD variables accordingly and make sure you respect the error codes that Twitter returns (this sample code might not be respecting the exponential backoff mechanism that Twitter wants in some cases).
I also set the timeout of the socket module, I believe I had some problems with the default timeout behavior in Python, so be careful.
我认为你无法访问世界推特时间线。但你当然可以查看你朋友的推文和设置列表来玩,我建议使用 Twitter4J 库 http: //twitter4j.org/en/index.html
我可能弄错了, getPublicTimeline() 可能就是你想要的。
I don't think you can get access to the world twitter timeline. But you can certainly look at your friends tweets and setup lists to play with, I would recommend using the Twitter4J library http://twitter4j.org/en/index.html
I might have been mistaken, getPublicTimeline() might be what you want.
Twitter 有一个 流 API 就是为了这个目的。他们提供了发布到 Twitter 的所有消息的一个小随机样本,并按照您所描述的那样以“推送”方式不断更新。如果您这样做是为了某种崇高的目的,那么您可以请求访问从 Twitter 获取更大的样本。
从 API 文档中,您需要
statuses/sample
:就个人而言,我使用 python 库取得了一些成功 tweepy 使用流 API。
Twitter has a streaming API for just this purpose. They provide a small random sample of all messages posted to twitter, continually updated in a 'push' manner as you describe. If you are doing this for some kind of noble purpose then you can request access from Twitter to a larger sample.
From the API docs, you want
statuses/sample
:Personally, I've had some success using the python library tweepy to use the streaming API.
Tweepy 的 BasicAuthHandler 已弃用。这是一组新代码。玩得开心!
Tweepy's BasicAuthHandler is deprecated. Here's a new set of code. Have fun!