使用 Twisted 的 getPage 作为 urlopen 吗？

发布于 2024-08-30 14:14:45 字数 311 浏览 4 评论 0原文

我想在Web应用程序中使用Twisted非阻塞getPage方法，但与urlopen相比，使用这种功能感觉相当复杂。

这是我想要实现的目标的一个示例：

def web_request(request):
   response = urllib.urlopen('http://www.example.org')
   return HttpResponse(len(response.read()))

与 getPage 类似的东西很难吗？

原文

I would like to use Twisted non-blocking getPage method within a webapp, but it feels quite complicated to use such function compared to urlopen.

This is an example of what I'm trying to achive:

def web_request(request):
   response = urllib.urlopen('http://www.example.org')
   return HttpResponse(len(response.read()))

Is it so hard to have something similar with getPage?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

桃气十足 2024-09-06 14:14:45

关于非阻塞操作（您似乎明确想要的）要认识到的事情是您无法真正用它们编写顺序代码。这些操作不会阻塞，因为它们不等待结果。他们开始操作并将控制权返回给您的函数。因此，getPage 不会像 urllib.urlopen 那样返回可以读取的类似文件的对象。即使它确实存在，您也无法从中读取数据，直到数据可用（否则它会阻塞）。因此您不能对其调用 len() ，因为它需要读取首先处理所有数据（这会阻塞）。

在 Twisted 中处理非阻塞操作的方法是通过 Deferreds，它们是用于管理回调的对象。 getPage 返回一个 Deferred，这意味着“您稍后会得到这个结果”。在获得结果之前，您无法对结果执行任何操作，因此您将回调添加到Deferred，并且Deferred将调用这些回调当结果可用时。然后，该回调可以执行您想要的操作：

def web_request(request)
    def callback(data):
        HttpResponse(len(data))
    d = getPage("http://www.example.org")
    d.addCallback(callback)
    return d

您的示例的另一个问题是您的 web_request 函数本身正在阻塞。在等待 getPage 结果可用时，您想做什么？在 web_request 中执行其他操作，还是只是等待？或者您想将 web_request 本身设置为非阻塞？如果是这样，你想如何产生结果？（Twisted 中显而易见的选择是返回另一个 Deferred —— 或者甚至与 getPage 返回的相同，如上例所示。如果您不过，正在另一个框架中编写代码。）

有一种使用 Deferreds 编写顺序代码的方法，尽管它有些限制，更难调试，而且核心 Twisted 让人哭了当您使用它时：twisted.internet.defer.inlineCallbacks。它使用 Python 2.5 中新的生成器功能，您可以将数据发送到生成器中，代码看起来有点像这样：

@defer.inlineCallbacks
def web_request(request)
    data = yield getPage("http://www.example.org")
    HttpResponse(len(data))

就像显式返回 d Deferred 的示例一样，这仅在以下情况下才有效：调用者期望 web_request 是非阻塞的——defer.inlineCallbacks 装饰器将生成器转换为返回 Deferred 的函数。

The thing to realize about non-blocking operations (which you seem to explicitly want) is that you can't really write sequential code with them. The operations don't block because they don't wait for a result. They start the operation and return control to your function. So, getPage doesn't return a file-like object you can read from like urllib.urlopen does. And even if it did, you couldn't read from it until the data was available (or it would block.) And so you can't call len() on it, since that needs to read all the data first (which would block.)

The way to deal with non-blocking operations in Twisted is through Deferreds, which are objects for managing callbacks. getPage returns a Deferred, which means "you will get this result later". You can't do anything with the result until you get it, so you add callbacks to the Deferred, and the Deferred will call these callbacks when the result is available. That callback can then do what you want it to:

def web_request(request)
    def callback(data):
        HttpResponse(len(data))
    d = getPage("http://www.example.org")
    d.addCallback(callback)
    return d

An additional problem with your example is that your web_request function itself is blocking. What do you want to do while you wait for the result of getPage to become available? Do something else within web_request, or just wait? Or do you want to turn web_request itself non-blocking? If so, how do you want to produce the result? (The obvious choice in Twisted is to return another Deferred -- or even the same one as getPage returns, as in the example above. This may not always be appropriate if you're writing code in another framework, though.)

There is a way to write sequential code using Deferreds, although it's somewhat restrictive, harder to debug, and core Twisted people cry when you use it: twisted.internet.defer.inlineCallbacks. It uses the new generator feature in Python 2.5 where you can send data into a generator, and the code would look somewhat like this:

@defer.inlineCallbacks
def web_request(request)
    data = yield getPage("http://www.example.org")
    HttpResponse(len(data))

Like the example that explicitly returned the d Deferred, this'll only work if the caller expects web_request to be non-blocking -- the defer.inlineCallbacks decorator turns the generator into a function that returns a Deferred.

回复收藏 0 原文

美人骨 2024-09-06 14:14:45

我发布了回复到类似问题最近提供了使用 getPage 从 URL 获取内容所需的最少量代码。这是为了完整性：

from twisted.web.client import getPage
from twisted.internet import reactor

url = 'http://aol.com'

def print_and_stop(output):
    print output
    if reactor.running:
       reactor.stop()

if __name__ == '__main__':
    print 'fetching', url
    d = getPage(url)
    d.addCallback(print_and_stop)
    reactor.run()

请记住，您可能需要更深入地了解 reactor Twisted 使用模式来处理事件（在本例中，getPage 触发是一个事件）。

I posted a response to a similar question recently that provides the minimal amount of code required to get the contents from a URL using getPage. Here it is for completeness:

from twisted.web.client import getPage
from twisted.internet import reactor

url = 'http://aol.com'

def print_and_stop(output):
    print output
    if reactor.running:
       reactor.stop()

if __name__ == '__main__':
    print 'fetching', url
    d = getPage(url)
    d.addCallback(print_and_stop)
    reactor.run()

Keep in mind that you'll probably need a more in-depth understanding of the reactor pattern used by Twisted to handle events (getPage firing being an event in this instance).

回复收藏 0 原文

~没有更多了~