Python 异步请求 +写入文件混乱

发布于 2025-01-10 08:41:12 字数 1236 浏览 2 评论 0原文

我有一个简单的 python 异步程序，可以从特定 URL 检索 gzip 压缩文件。我已使用 aiohttp 进行异步请求。根据 aiohttp 文档 (https://docs.aiohttp.org/en/stable/ client_quickstart.html），我在我的测试方法中的“流响应内容”下使用了他们的示例来写入数据。

async def main(url):
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(test(session, url))


async def test(session, url):
    async with session.get(url=url) as r:
        with open('test.csv.gz', 'wb') as f:
            async for chunk in r.content.iter_chunked(1024):
                f.write(chunk)

但是，我不确定 test() 中的内容是否实际上是异步的。我读过的很多文章都提到了异步协程中需要使用“await”关键字来激活异步性（例如 r = wait session.get(url=url)），但我想知道它是否是“async with”和“async for”模式也能达到同样的效果？

我希望实现的是在执行 session.get() 以及将数据写入本地时的异步功能，这样，如果我传入许多 url，它将 a) 在获取 url 时执行异步切换，b) 执行将数据写入本地时的异步切换。

对于 b)，我需要使用类似下面的东西吗？

async with aiofiles.open('test.csv.gz', 'wb') as f:
    async for chunk in r.content.iter_chunked(1024):
        await f.write()

这让我想到了一个稍微偏离主题的问题，但是 async with session.get(url=url) as r: 和 r = wait session.get(url=网址）？

如果我的理解有缺陷或者我缺少有关异步功能的基本内容，请告诉我！

原文

I have a simple python async program that retrieves a gzipped file from a particular URL. I have used the aiohttp for async requests. As per the aiohttp docs (https://docs.aiohttp.org/en/stable/client_quickstart.html), I have used their example under 'Streaming Response Content' in my test method to write the data.

async def main(url):
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(test(session, url))


async def test(session, url):
    async with session.get(url=url) as r:
        with open('test.csv.gz', 'wb') as f:
            async for chunk in r.content.iter_chunked(1024):
                f.write(chunk)

However, I am not sure if the stuff in test() is actually asynchronous or not. Many articles I've read mention the requirement of the 'await' keyword in async coroutines to activate the asynchronicity (e.g. something like r = await session.get(url=url)), but Im wondering if it is the 'async with' and 'async for' patterns that also achieve the same thing?

What I am hoping to achieve is async functionality when doing session.get(), as well as when writing the data to local, such that if I pass in many urls it will a) perform async switching when getting the url and b) perform async switching when writing the data onto local.

For b), would I need to use something like the following?

async with aiofiles.open('test.csv.gz', 'wb') as f:
    async for chunk in r.content.iter_chunked(1024):
        await f.write()

This leads me to a slightly off-topic question, but what is the difference between async with session.get(url=url) as r: and r = await session.get(url=url)?

Please let me know if my understanding is flawed or if there is something fundamental I am missing regarding the async functionality!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泅人 2025-01-17 08:41:12

好问题。看下面的小程序，它运行两个任务。每个都有一个异步上下文管理器和一个异步迭代器：

import asyncio
from contextlib import asynccontextmanager

@asynccontextmanager
async def nada():
    # await asyncio.sleep(0.0)
    yield
    
async def aloop():
    for _ in range(5):
        # await asyncio.sleep(0.0)
        yield
    
async def atask(name):
    async for _ in aloop():
        async with nada():
            print("Task", name)

async def main():
    asyncio.create_task(atask("1"))
    await asyncio.create_task(atask("2"))

if __name__ == "__main__":
    asyncio.run(main())

输出：

Task 1
Task 1
Task 1
Task 1
Task 1
Task 2
Task 2
Task 2
Task 2
Task 2

没有发生任务切换。

现在取消上下文管理器 (nada) 或迭代器 (aloop) 中 await asyncio.sleep(0.0) 的注释。输出变为：

Task 1
Task 2
Task 1
Task 2
Task 1
Task 2
Task 1
Task 2
Task 1
Task 2

现在确实发生了任务切换。但你的主程序在这两种情况下是完全相同的。

因此，第一个问题的答案是“async with”和“async for”模式不一定会导致任务切换。这取决于他们的实施。异步上下文管理器和异步迭代器都调用特殊的机制；如果该机器执行await表达式，就会发生任务切换。但 Python 不需要异步上下文管理器或异步迭代器来做到这一点。

这是完全合法的 Python：

async def do_nothing:
    pass

实际上，您可能可以信任像 aiohttp 这样广泛部署的库。将 async 关键字放在方法前面并且不在其中执行等待并没有多大价值。我能想到的唯一用例是当 API 需要协程但不需要异步行为时。将此类函数放入通用库中将是糟糕的设计，至少没有良好的文档。

你的第二个问题 - async with session.get(url=url) as r: 和 r = wait session.get(url=url) 之间有什么区别 - 是第一种形式执行两个特殊功能，而第二种则不执行。第一个在某种程度上相当于：

try:
    x = session.get(url=url)
    r = await x.__aenter__()    
    # the indented block of code executes here
finally:
    await x.__aexit__(...)

__aexit__ 方法接受一些与异常处理有关的参数，您可以在文档中阅读相关内容。同步上下文管理器类似，只是特殊方法名为 __enter__ 和 __exit__。

Good question. Look at the following little program, which runs two tasks. Each has an async context manager and an async iterator:

import asyncio
from contextlib import asynccontextmanager

@asynccontextmanager
async def nada():
    # await asyncio.sleep(0.0)
    yield
    
async def aloop():
    for _ in range(5):
        # await asyncio.sleep(0.0)
        yield
    
async def atask(name):
    async for _ in aloop():
        async with nada():
            print("Task", name)

async def main():
    asyncio.create_task(atask("1"))
    await asyncio.create_task(atask("2"))

if __name__ == "__main__":
    asyncio.run(main())

Output:

Task 1
Task 1
Task 1
Task 1
Task 1
Task 2
Task 2
Task 2
Task 2
Task 2

No task switching occurs.

Now uncomment the await asyncio.sleep(0.0) in either the context manager (nada) or the iterator (aloop). Output becomes:

Task 1
Task 2
Task 1
Task 2
Task 1
Task 2
Task 1
Task 2
Task 1
Task 2

Now task switching does occur. But your main program is exactly the same in both cases.

So the answer to your first question is that the 'async with' and 'async for' patterns do not necessarily cause task switching. It depends on their implementation. Both async context managers and async iterators invoke special machinery; if that machinery executes an await expression, task switching will occur. But Python does not require async context managers or async iterators to do that.

This is perfectly legal Python:

async def do_nothing:
    pass

As a practical matter, you are probably OK to trust a widely deployed library like aiohttp. There is not much value in placing the async keyword in front of a method and not performing an await within it. The only use-case I can think of is when an API requires a coroutine but you have no need for asynchronous behavior. It would be poor design to put that sort of function in a general-use library, at any rate not without good documentation.

Your second question - what's the difference between async with session.get(url=url) as r: and r = await session.get(url=url) - is that the first form executes two special functions and the second one doesn't. The first one is somewhat equivalent to:

try:
    x = session.get(url=url)
    r = await x.__aenter__()    
    # the indented block of code executes here
finally:
    await x.__aexit__(...)

The __aexit__ method takes some arguments having to do with exception handling, which you can read about in the docs. Synchronous context managers are similar except the special methods are named __enter__ and __exit__.

回复收藏 0 原文

~没有更多了~