当前位置：文江博客话题详情

Python 中的异步 HTTP 调用

发布于 2024-10-16 14:54:08 字数 212 浏览 3 评论 0原文

我需要 Python 中的回调类型功能，其中我多次向 Web 服务发送请求，每次都更改参数。我希望这些请求同时发生而不是顺序发生，因此我希望异步调用该函数。

看起来 asyncore 是我可能想要使用的，但我见过的关于它如何工作的示例看起来都有些过头了，所以我想知道是否还有另一条路我应该走。对模块/流程有什么建议吗？理想情况下，我想以程序方式使用它们，而不是创建类，但我可能无法解决这个问题。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

狼性发作 2024-10-23 14:54:08

从 Python 3.2 开始，您可以使用 concurrent.futures 来启动并行任务。

查看这个 ThreadPoolExecutor 示例：

http: //docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example

它生成线程来检索 HTML 并在收到响应时对其进行操作。

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

上面的例子使用了线程。还有一个类似的 ProcessPoolExecutor ，它使用进程池而不是线程：

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

Starting in Python 3.2, you can use concurrent.futures for launching parallel tasks.

Check out this ThreadPoolExecutor example:

http://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example

It spawns threads to retrieve HTML and acts on responses as they are received.

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

The above example uses threading. There is also a similar ProcessPoolExecutor that uses a pool of processes, rather than threads:

http://docs.python.org/dev/library/concurrent.futures.html#processpoolexecutor-example

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

回复收藏 0 原文

一指流沙 2024-10-23 14:54:08

您了解 eventlet 吗？它允许您编写看似同步的代码，但使其通过网络异步运行。

这是一个超最小爬虫的示例：

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
     "https://wiki.secondlife.com/w/images/secondlife.jpg",
     "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]

import eventlet
from eventlet.green import urllib2

def fetch(url):

  return urllib2.urlopen(url).read()

pool = eventlet.GreenPool()

for body in pool.imap(fetch, urls):
  print "got body", len(body)

Do you know about eventlet? It lets you write what appears to be synchronous code, but have it operate asynchronously over the network.

Here's an example of a super minimal crawler:

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
     "https://wiki.secondlife.com/w/images/secondlife.jpg",
     "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]

import eventlet
from eventlet.green import urllib2

def fetch(url):

  return urllib2.urlopen(url).read()

pool = eventlet.GreenPool()

for body in pool.imap(fetch, urls):
  print "got body", len(body)

回复收藏 0 原文

倾城月光淡如水﹏ 2024-10-23 14:54:08

Twisted 框架正是实现这一目标的门票。但是如果你不想这样做，你也可以使用 pycurl，libcurl 的包装器，它有它的拥有异步事件循环并支持回调。

回复收藏 0 原文

绻影浮沉 2024-10-23 14:54:08

（虽然这个线程是关于服务器端Python的。因为这个问题是不久前提出的。其他人可能会在客户端寻找类似的答案时偶然发现这个问题）

对于客户端解决方案，您可能想要采取查看 Async.js 库，尤其是“控制流”部分。

https://github.com/caolan/async#control-flow

通过组合“平行”与“瀑布”可以达到你想要的结果。

WaterFall( Parallel(TaskA, TaskB, TaskC) -> PostParallelTask)

如果您检查 Control-Flow - “Auto”下的示例，它们会为您提供上述示例：
https://github.com/caolan/async#autotasks-callback
其中“write-file”取决于“get_data”和“make_folder”，“email_link”取决于 write-file”。

请注意，所有这些都发生在客户端（除非您在服务器上执行 Node.JS） - 端）

对于服务器端 Python，请查看 PyCURL @ https:/ /github.com/pycurl/pycurl/blob/master/examples/basicfirst.py

通过将下面的示例与 pyCurl 结合，您可以实现非阻塞多线程功能。