Python - S3 使用 download_fileobj 下载文件

发布于 2025-01-14 01:39:22 字数 843 浏览 4 评论 0原文

以下函数使用进度回调从 S3 下载文件。代码运行正常，但是当达到100%时，需要很长时间才能返回。当文件很大时，问题会变得更糟。我认为 f.seek(0) 导致了这个问题，但我不知道如何解决它。在某些情况下删除它会导致错误。

def download(s3_client, s3_object_key):
    meta_data = s3_client.head_object(Bucket=BUCKET, Key=s3_object_key)
    total_length = int(meta_data.get('ContentLength', 0))
    downloaded = 0

    def progress(chunk):
        nonlocal downloaded
        downloaded += chunk
        done = int(50 * downloaded / total_length)
        sys.stdout.write("\r[%s%s] %%%s " % ('=' * done, ' ' * (50 - done), round(downloaded / total_length * 100, 2)))
        sys.stdout.flush()

    print(f'Downloading {s3_object_key}')

    f = io.BytesIO()
    s3_client.download_fileobj(BUCKET, s3_object_key, f, Callback=progress)
    f.seek(0) # <---- Could be the cause
    print('Done.')
    return f

原文

The following function downloads a file from S3 using a progress callback. The code works fine, but when it reaches 100%, it takes a long time to return. The problem gets worse when the file is large. I think that f.seek(0) is causing the issue, but I'm not sure how to fix it. Removing it causes error in some cases.

def download(s3_client, s3_object_key):
    meta_data = s3_client.head_object(Bucket=BUCKET, Key=s3_object_key)
    total_length = int(meta_data.get('ContentLength', 0))
    downloaded = 0

    def progress(chunk):
        nonlocal downloaded
        downloaded += chunk
        done = int(50 * downloaded / total_length)
        sys.stdout.write("\r[%s%s] %%%s " % ('=' * done, ' ' * (50 - done), round(downloaded / total_length * 100, 2)))
        sys.stdout.flush()

    print(f'Downloading {s3_object_key}')

    f = io.BytesIO()
    s3_client.download_fileobj(BUCKET, s3_object_key, f, Callback=progress)
    f.seek(0) # <---- Could be the cause
    print('Done.')
    return f

分享到QQ

分享到微博