从 Twitter 获取稳定的消息流

发布于 2024-11-16 14:45:41 字数 210 浏览 2 评论 0 原文

我想尝试制作一个简单的 Twitter 客户端,它可以了解我的品味并自动查找朋友和有趣的推文,为我提供相关信息。

首先,我需要获得大量随机 Twitter 消息,以便我可以在它们上测试一些机器学习算法。

为此我应该使用哪些 API 方法?我是否必须定期轮询才能获取消息,或者有没有办法让 Twitter 在消息发布时推送消息?

我也有兴趣了解任何类似的项目。

I'd like to try to make a simple twitter client that learns my tastes and automatically finds friends and interesting tweets to provide me with relevant information.

To get started, I would need to get a good stream of random twitter messages, so I can test a few machine learning algorithms on them.

What API methods should I use for this? Do I have to poll regularly to get messages, or is there a way to get twitter to push messages as they are published?

I'd also be interested in learning about any similar project.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梦里南柯 2024-11-23 14:45:41

我使用 tweepy 访问 Twitter API 并收听 他们提供的公共流——这应该是所有推文的百分之一的样本。这是我自己使用的示例代码。您仍然可以使用基本的身份验证机制进行流式传输,尽管它们可能很快就会改变。相应地更改 USERNAME 和 PASSWORD 变量,并确保您遵循 Twitter 返回的错误代码(此示例代码可能不遵循 Twitter 在某些情况下所需的指数退避机制)。

import tweepy
import time

def log_error(msg):
    timestamp = time.strftime('%Y%m%d:%H%M:%S')
    sys.stderr.write("%s: %s\n" % (timestamp,msg))

class StreamWatcherListener(tweepy.StreamListener):
  def on_status(self, status):
      print status.text.encode('utf-8')

    def on_error(self, status_code):
      log_error("Status code: %s." % status_code)
      time.sleep(3)
      return True  # keep stream alive

    def on_timeout(self):
      log_error("Timeout.")


def main():
    auth = tweepy.BasicAuthHandler(USERNAME, PASSWORD)
    listener = StreamWatcherListener()
    stream = tweepy.Stream(auth, listener)
    stream.sample()

if __name__ == '__main__':
    try:
      main()
    except KeyboardInterrupt:
      break
    except Exception,e:
      log_error("Exception: %s" % str(e))
      time.sleep(3)

我还设置了套接字模块的超时,我相信我对Python中的默认超时行为有一些问题,所以要小心。

import socket
socket.setdefaulttimeout(timeout)

I use tweepy to access Twitter API and listen to the public stream they provide -- which should be a one-percent-sample of all tweets. Here is my sample code that I use myself. You can still use the basic auth mechanism for streaming, though they may change that soon. Change the USERNAME and PASSWORD variables accordingly and make sure you respect the error codes that Twitter returns (this sample code might not be respecting the exponential backoff mechanism that Twitter wants in some cases).

import tweepy
import time

def log_error(msg):
    timestamp = time.strftime('%Y%m%d:%H%M:%S')
    sys.stderr.write("%s: %s\n" % (timestamp,msg))

class StreamWatcherListener(tweepy.StreamListener):
  def on_status(self, status):
      print status.text.encode('utf-8')

    def on_error(self, status_code):
      log_error("Status code: %s." % status_code)
      time.sleep(3)
      return True  # keep stream alive

    def on_timeout(self):
      log_error("Timeout.")


def main():
    auth = tweepy.BasicAuthHandler(USERNAME, PASSWORD)
    listener = StreamWatcherListener()
    stream = tweepy.Stream(auth, listener)
    stream.sample()

if __name__ == '__main__':
    try:
      main()
    except KeyboardInterrupt:
      break
    except Exception,e:
      log_error("Exception: %s" % str(e))
      time.sleep(3)

I also set the timeout of the socket module, I believe I had some problems with the default timeout behavior in Python, so be careful.

import socket
socket.setdefaulttimeout(timeout)
ぽ尐不点ル 2024-11-23 14:45:41

我认为你无法访问世界推特时间线。但你当然可以查看你朋友的推文和设置列表来玩,我建议使用 Twitter4J 库 http: //twitter4j.org/en/index.html

我可能弄错了, getPublicTimeline() 可能就是你想要的。

I don't think you can get access to the world twitter timeline. But you can certainly look at your friends tweets and setup lists to play with, I would recommend using the Twitter4J library http://twitter4j.org/en/index.html

I might have been mistaken, getPublicTimeline() might be what you want.

给我一枪 2024-11-23 14:45:41

Twitter 有一个 流 API 就是为了这个目的。他们提供了发布到 Twitter 的所有消息的一个小随机样本,并按照您所描述的那样以“推送”方式不断更新。如果您这样做是为了某种崇高的目的,那么您可以请求访问从 Twitter 获取更大的样本。

从 API 文档中,您需要 statuses/sample

状态/样本

返回一个随机数
所有公共状态的样本。这
默认访问级别“Spritzer”
提供了一小部分
Firehose,非常粗略地,占全部的 1%
公共地位。 “花园软管”
访问级别提供了一个比例
更适合数据挖掘和
研究需要的应用
较大比例有待统计
重要样本。现在
Gardenhose 的回报率非常粗略,10%
所有公共身份。注意
这些比例受制于
流量突然调整
音量有所不同。

URL:http://stream.twitter.com/1/statuses/sample。 json

方法:GET

参数:计数,分隔

返回:状态元素流

就个人而言,我使用 python 库取得了一些成功 tweepy 使用流 API。

Twitter has a streaming API for just this purpose. They provide a small random sample of all messages posted to twitter, continually updated in a 'push' manner as you describe. If you are doing this for some kind of noble purpose then you can request access from Twitter to a larger sample.

From the API docs, you want statuses/sample:

statuses/sample

Returns a random
sample of all public statuses. The
default access level, ‘Spritzer’
provides a small proportion of the
Firehose, very roughly, 1% of all
public statuses. The “Gardenhose”
access level provides a proportion
more suitable for data mining and
research applications that desire a
larger proportion to be statistically
significant sample. Currently
Gardenhose returns, very roughly, 10%
of all public statuses. Note that
these proportions are subject to
unannounced adjustment as traffic
volume varies.

URL: http://stream.twitter.com/1/statuses/sample.json

Method(s): GET

Parameters: count, delimited

Returns: stream of status element

Personally, I've had some success using the python library tweepy to use the streaming API.

素罗衫 2024-11-23 14:45:41
import tweepy, sys, time

ckey = ''
csecret = ''
atoken = ''
asecret = ''
def log_error(msg):
    timestamp = time.strftime('%Y%m%d:%H%M:%S')
    sys.stderr.write("%s: %s\n" % (timestamp,msg))

class StreamWatcherListener(tweepy.StreamListener):
  def on_data(self, status):
    try: #Some of the object are deletion of tweet, won't have 'text' in the dict
      print getData['text']
    except Exception, e:
      pass
    #print text.encode('utf-8')
  def on_error(self, status_code):
    log_error("Status code: %s." % status_code)
    time.sleep(3)
    return True  # keep stream alive
  def on_timeout(self):
    log_error("Timeout.")

def main():
  auth = tweepy.OAuthHandler(ckey, csecret)
  auth.set_access_token(atoken, asecret)
  listener = StreamWatcherListener()
  stream = tweepy.Stream(auth, listener)
  stream.sample()

if __name__ == '__main__':
    try:
      main()
    except Exception,e:
      log_error("Exception: %s" % str(e))
      time.sleep(3)

Tweepy 的 BasicAuthHandler 已弃用。这是一组新代码。玩得开心!

import tweepy, sys, time

ckey = ''
csecret = ''
atoken = ''
asecret = ''
def log_error(msg):
    timestamp = time.strftime('%Y%m%d:%H%M:%S')
    sys.stderr.write("%s: %s\n" % (timestamp,msg))

class StreamWatcherListener(tweepy.StreamListener):
  def on_data(self, status):
    try: #Some of the object are deletion of tweet, won't have 'text' in the dict
      print getData['text']
    except Exception, e:
      pass
    #print text.encode('utf-8')
  def on_error(self, status_code):
    log_error("Status code: %s." % status_code)
    time.sleep(3)
    return True  # keep stream alive
  def on_timeout(self):
    log_error("Timeout.")

def main():
  auth = tweepy.OAuthHandler(ckey, csecret)
  auth.set_access_token(atoken, asecret)
  listener = StreamWatcherListener()
  stream = tweepy.Stream(auth, listener)
  stream.sample()

if __name__ == '__main__':
    try:
      main()
    except Exception,e:
      log_error("Exception: %s" % str(e))
      time.sleep(3)

Tweepy's BasicAuthHandler is deprecated. Here's a new set of code. Have fun!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文