为什么aiofiles 比普通文件操作还要慢?

发布于 2022-09-13 00:09:08 字数 1998 浏览 24 评论 0

多个日志文件中查找是否含有某个字符串,发现aiofiles很慢,不知道是否使用方法有误?恳请指点

files = [
    r'C:\log\20210523.log',
    r'C:\log\20210522.log',
    r'C:\log\20210521.log',
    r'C:\log\20210524.log',
    r'C:\log\20210525.log',
    r'C:\log\20210520.log',
    r'C:\log\20210519.log',
]

async def match_content_in_file(filename:str,content:str,encoding:str="gbk")->bool:
    async with aiofiles.open(filename,mode="r",encoding=encoding) as f:
        # text = await f.read()
        # return content in text
        
        async for line in f:
            if content in line:
                return True


def match_content_in_file2(filename:str,content:str,encoding:str="gbk")->bool:
    with open(filename,mode="r",encoding=encoding) as f:
        # text = f.read()
        # return content in text
        
        for line in f:
            if content in line:
                return True
                
async def main3():
    start = time.time()
    tasks = [match_content_in_file(f,'808395') for f in files]
    l = await asyncio.gather(*tasks)
    print(l)
    end = time.time()
    print(end - start)

def main2():
    start = time.time()
    l = []
    for f in files:
        l.append(match_content_in_file2(f,'808395'))
    print(l)
    end = time.time()
    print(end-start)

if __name__ == '__main__':
    asyncio.run(main3())   # 很慢
    main2()   # 很快

实测情况(每个文件约7.5M)

  • 逐行读取文件内容异步方式耗时巨大。
[True, True, True, None, None, True, True]
异步方式: 40.80606389045715
-------------------------------------
[True, True, True, None, None, True, True]
同步方式: 0.48870062828063965
  • 一次性读取文件内容,异步方式和同步方式差别不大,但还是同步快一点
[True, True, True, False, False, True, True]
异步方式: 0.6835882663726807
-------------------------------------
[True, True, True, False, False, True, True]
同步方式: 0.6745946407318115

环境
python 3.9.2 win10

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

乖乖公主 2022-09-20 00:09:09

aio是io复用,只能解决io性能问题,可以看下cpu,如果单核cpu已经打满了的话,用协程也不会提升性能的

差↓一点笑了 2022-09-20 00:09:09

为什么我测试正好相反呢, 环境 3.8.2

import time
import asyncio

files = [
    r'C:\log\20210523.log',
    r'C:\log\20210522.log',
    r'C:\log\20210521.log',
    r'C:\log\20210524.log',
    r'C:\log\20210525.log',
    r'C:\log\20210520.log',
    r'C:\log\20210519.log',
]


def match_content_in_file(f, s):
    time.sleep(1) # 都是sleep 1s

async def match_content_in_file_asc(f,s):
    await asyncio.sleep(1)  # 都是sleep 1s


async def main3():
    start = time.time()
    tasks = [match_content_in_file_asc(f,'808395') for f in files]
    l = await asyncio.gather(*tasks)
    print(l)
    end = time.time()
    print(end - start)


def main2():
    start = time.time()
    l = []
    for f in files:
        l.append(match_content_in_file(f,'808395'))
    print(l)
    end = time.time()
    print(end-start)


if __name__ == '__main__':
    asyncio.run(main3())   # 很快
    main2()   # 很慢

outputs

/bin/python3 test.py
[None, None, None, None, None, None, None]
1.000645637512207
[None, None, None, None, None, None, None]
7.005064249038696
飘然心甜 2022-09-20 00:09:09

这个测试很有趣,我也测了一下

  1. 当文件都不存在时,aiofiles快很多
~/test ᐅ python3 -V
Python 3.8.1
~/test ᐅ python3 aiotest.py
[None, None, None, None, None, None, None]
1.0032050609588623
[None, None, None, None, None, None, None]
7.023258686065674
~/test ᐅ sw_vers
ProductName:    Mac OS X
ProductVersion:    10.15.7
BuildVersion:    19H2
  1. 当文件存在时,aiofiles慢了很多很多

    import asyncio
    import os
    import time
    from random import randint
    from pathlib import Path
    
    import aiofiles
    
    
    BASE_DIR = Path('log')
    files = [
     '20210523.log',
     '20210522.log',
     '20210521.log',
     '20210524.log',
     '20210525.log',
     '20210520.log',
     '20210519.log',
    ]
    
    
    def gen_files():
     if not BASE_DIR.exists():
         BASE_DIR.mkdir(parents=True)
     for fname in files:
         if not (p := BASE_DIR / fname).exists():
             nums = [randint(10**6, 10**7-1) for _ in range(1024*1024)]
             p.write_text('\n'.join(map(str, nums)))
             print(f'{p} created!')
     os.system(f'ls -lh {BASE_DIR}')
    
    
    async def match_content_in_file(filename:str,content:str,encoding:str="gbk")->bool:
     async with aiofiles.open(filename,mode="r",encoding=encoding) as f:
         # text = await f.read()
         # return content in text
    
         async for line in f:
             if content in line:
                 return True
    
    
    def match_content_in_file2(filename:str,content:str,encoding:str="gbk")->bool:
     with open(filename,mode="r",encoding=encoding) as f:
         # text = f.read()
         # return content in text
    
         for line in f:
             if content in line:
                 return True
    
    
    async def main3():
     print('Start async process...')
     start = time.time()
     tasks = [match_content_in_file(BASE_DIR/f,'808395') for f in files]
     l = await asyncio.gather(*tasks)
     print(l)
     end = time.time()
     print(end - start)
    
    
    def main2():
     print('Start sync process...')
     start = time.time()
     l = []
     for f in files:
         l.append(match_content_in_file2(BASE_DIR/f,'808395'))
     print(l)
     end = time.time()
     print(end-start)
    
    
    if __name__ == '__main__':
     gen_files() # 生成测试用文件
     asyncio.run(main3())   # 很慢
     main2()   # 很快

    结果:

    total 114688
    -rw-r--r--  1 lian  staff   8.0M  6 12 00:24 20210519.log
    -rw-r--r--  1 lian  staff   8.0M  6 12 00:24 20210520.log
    -rw-r--r--  1 lian  staff   8.0M  6 12 00:23 20210521.log
    -rw-r--r--  1 lian  staff   8.0M  6 12 00:23 20210522.log
    -rw-r--r--  1 lian  staff   8.0M  6 12 00:23 20210523.log
    -rw-r--r--  1 lian  staff   8.0M  6 12 00:24 20210524.log
    -rw-r--r--  1 lian  staff   8.0M  6 12 00:24 20210525.log
    Start async process...
    [True, True, True, True, True, True, True]
    283.923513174057
    Start sync process...
    [True, True, True, True, True, True, True]
    0.46163487434387207
海未深 2022-09-20 00:09:08

硬盘读取一个文件是最快的, 同时多读几个文件, 要在多个磁盘块中反复切换, 反而慢.

读文件和网络通讯不一样, 网络请求是在发送后, 需要等待, 这个时候可以使用协程提升并发数量.
硬盘不行.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文