Python 中的异步 HTTP 调用

发布于 2024-10-16 14:54:08 字数 212 浏览 3 评论 0原文

我需要 Python 中的回调类型功能,其中我多次向 Web 服务发送请求,每次都更改参数。我希望这些请求同时发生而不是顺序发生,因此我希望异步调用该函数。

看起来 asyncore 是我可能想要使用的,但我见过的关于它如何工作的示例看起来都有些过头了,所以我想知道是否还有另一条路我应该走。对模块/流程有什么建议吗?理想情况下,我想以程序方式使用它们,而不是创建类,但我可能无法解决这个问题。

I have a need for a callback kind of functionality in Python where I am sending a request to a webservice multiple times, with a change in the parameter each time. I want these requests to happen concurrently instead of sequentially, so I want the function to be called asynchronously.

It looks like asyncore is what I might want to use, but the examples I've seen of how it works all look like overkill, so I'm wondering if there's another path I should be going down. Any suggestions on modules/process? Ideally I'd like to use these in a procedural fashion instead of creating classes but I may not be able to get around that.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

狼性发作 2024-10-23 14:54:08

从 Python 3.2 开始,您可以使用 concurrent.futures 来启动并行任务。

查看这个 ThreadPoolExecutor 示例:

http: //docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example

它生成线程来检索 HTML 并在收到响应时对其进行操作。

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

上面的例子使用了线程。还有一个类似的 ProcessPoolExecutor ,它使用进程池而不是线程:

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

Starting in Python 3.2, you can use concurrent.futures for launching parallel tasks.

Check out this ThreadPoolExecutor example:

http://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example

It spawns threads to retrieve HTML and acts on responses as they are received.

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

The above example uses threading. There is also a similar ProcessPoolExecutor that uses a pool of processes, rather than threads:

http://docs.python.org/dev/library/concurrent.futures.html#processpoolexecutor-example

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))
一指流沙 2024-10-23 14:54:08

您了解 eventlet 吗?它允许您编写看似同步的代码,但使其通过网络异步运行。

这是一个超最小爬虫的示例:

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
     "https://wiki.secondlife.com/w/images/secondlife.jpg",
     "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]

import eventlet
from eventlet.green import urllib2

def fetch(url):

  return urllib2.urlopen(url).read()

pool = eventlet.GreenPool()

for body in pool.imap(fetch, urls):
  print "got body", len(body)

Do you know about eventlet? It lets you write what appears to be synchronous code, but have it operate asynchronously over the network.

Here's an example of a super minimal crawler:

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
     "https://wiki.secondlife.com/w/images/secondlife.jpg",
     "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]

import eventlet
from eventlet.green import urllib2

def fetch(url):

  return urllib2.urlopen(url).read()

pool = eventlet.GreenPool()

for body in pool.imap(fetch, urls):
  print "got body", len(body)
倾城月光淡如水﹏ 2024-10-23 14:54:08

Twisted 框架 正是实现这一目标的门票。但是如果你不想这样做,你也可以使用 pycurl,libcurl 的包装器,它有它的拥有异步事件循环并支持回调。

Twisted framework is just the ticket for that. But if you don't want to take that on you might also use pycurl, wrapper for libcurl, that has its own async event loop and supports callbacks.

绻影浮沉 2024-10-23 14:54:08

(虽然这个线程是关于服务器端Python的。因为这个问题是不久前提出的。其他人可能会在客户端寻找类似的答案时偶然发现这个问题)

对于客户端解决方案,您可能想要采取查看 Async.js 库,尤其是“控制流”部分。

https://github.com/caolan/async#control-flow

通过组合“平行”与“瀑布”可以达到你想要的结果。

WaterFall( Parallel(TaskA, TaskB, TaskC) -> PostParallelTask​​)

如果您检查 Control-Flow - “Auto”下的示例,它们会为您提供上述示例:
https://github.com/caolan/async#autotasks-callback
其中“write-file”取决于“get_data”和“make_folder”,“email_link”取决于 write-file”。

请注意,所有这些都发生在客户端(除非您在服务器上执行 Node.JS) - 端)

对于服务器端 Python,请查看 PyCURL @ https:/ /github.com/pycurl/pycurl/blob/master/examples/basicfirst.py

通过将下面的示例与 pyCurl 结合,您可以实现非阻塞多线程功能。

(Although this thread is about server-side Python. Since this question was asked a while back. Others might stumble on this where they are looking for a similar answer on the client side)

For a client side solution, you might want to take a look at Async.js library especially the "Control-Flow" section.

https://github.com/caolan/async#control-flow

By combining the "Parallel" with a "Waterfall" you can achieve your desired result.

WaterFall( Parallel(TaskA, TaskB, TaskC) -> PostParallelTask)

If you examine the example under Control-Flow - "Auto" they give you an example of the above:
https://github.com/caolan/async#autotasks-callback
where "write-file" depends on "get_data" and "make_folder" and "email_link" depends on write-file".

Please note that all of this happens on the client side (unless you're doing Node.JS - on the server-side)

For server-side Python, look at PyCURL @ https://github.com/pycurl/pycurl/blob/master/examples/basicfirst.py

By combining the example below with pyCurl, you can achieve the non-blocking multi-threaded functionality.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文