如何使用 urllib2/httplib 调用 Twitter 的 Streaming/Filter Feed?

发布于 2024-08-26 21:42:27 字数 1309 浏览 5 评论 0原文

更新:

当我尝试了令人信服的尼克的答案中提出的解决方案并切换到谷歌的 urlfetch 时,我将其从答案中切换回来:

logging.debug("starting urlfetch for http://%s%s" % (self.host, self.url))
result = urlfetch.fetch("http://%s%s" % (self.host, self.url), payload=self.body, method="POST", headers=self.headers, allow_truncated=True, deadline=5)
logging.debug("finished urlfetch")

但不幸的是 finished urlfetch 从未打印 - 我看到日志中发生超时(它返回5秒后200),但执行似乎没有返回。


大家好 -

我正在尝试使用 Twitter 的 Streaming(又名 firehose)API 与 Google App Engine(我知道这可能不是一个很好的长期游戏,因为你无法保持与 GAE 的连接永久开放),但到目前为止,我还没有运气让我的程序真正运行解析 Twitter 返回的结果。

一些代码:

logging.debug("firing up urllib2")
req = urllib2.Request(url="http://%s%s" % (self.host, self.url), data=self.body, headers=self.headers)
logging.debug("called urlopen for %s %s, about to call urlopen" % (self.host, self.url))
fobj = urllib2.urlopen(req)
logging.debug("called urlopen")

不幸的是,当执行此代码时,我的调试输出从未显示打印的 叫 urlopen 行。我怀疑发生的情况是 Twitter 保持连接打开,而 urllib2 没有返回,因为服务器没有终止连接。

Wireshark 显示正确发送的请求以及返回的响应和结果。

我尝试将 Connection: close 添加到我的请求标头,但这没有产生成功的结果。

关于如何让它发挥作用有什么想法吗?

Update:

I switched this back from answered as I tried the solution posed in cogent Nick's answer and switched to Google's urlfetch:

logging.debug("starting urlfetch for http://%s%s" % (self.host, self.url))
result = urlfetch.fetch("http://%s%s" % (self.host, self.url), payload=self.body, method="POST", headers=self.headers, allow_truncated=True, deadline=5)
logging.debug("finished urlfetch")

but unfortunately finished urlfetch is never printed - I see the timeout happen in the logs (it returns 200 after 5 seconds), but execution doesn't seem tor return.


Hi All-

I'm attempting to play around with Twitter's Streaming (aka firehose) API with Google App Engine (I'm aware this probably isn't a great long term play as you can't keep the connection perpetually open with GAE), but so far I haven't had any luck getting my program to actually parse the results returned by Twitter.

Some code:

logging.debug("firing up urllib2")
req = urllib2.Request(url="http://%s%s" % (self.host, self.url), data=self.body, headers=self.headers)
logging.debug("called urlopen for %s %s, about to call urlopen" % (self.host, self.url))
fobj = urllib2.urlopen(req)
logging.debug("called urlopen")

When this executes, unfortunately, my debug output never shows the called urlopen line printed. I suspect what's happening is that Twitter keeps the connection open and urllib2 doesn't return because the server doesn't terminate the connection.

Wireshark shows the request being sent properly and a response returned with results.

I tried adding Connection: close to my request header, but that didn't yield a successful result.

Any ideas on how to get this to work?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

拥有 2024-09-02 21:42:27

App Engine 上的 urllib 是 urlfetch API 的精简包装器。您对正在发生的事情是正确的:Twitter 的流 API 永远不会终止其响应,因此它会超时,并且 urlfetch 会引发异常。

如果直接使用urlfetch,可以设置超时(最多10秒),并将allow_truncated设置为True,这样就可以得到部分结果。不过,Twitter 流 API 确实不太适合 App Engine,因为 App Engine 请求的执行时间仅限于 30 秒,而 urlfetch 请求无法逐步发回结果,或者需要超过 10 秒。使用 Twitter 的“标准”API 将是更好的选择。

urllib on App Engine is a thin wrapper around the urlfetch API. You're right about what's happening: Twitter's streaming API never terminates its response, so it times out, and urlfetch throws an exception.

If you use urlfetch directly, you can set the timeout (up to 10 seconds), and set allow_truncated to True so you can get the partial result. The Twitter streaming API really isn't a good match for App Engine, though, because App Engine requests are limited to 30 seconds of execution time, and urlfetch requests can't send back results progressively, or take more than 10 seconds. Using Twitter's 'standard' API would be a better option.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文