如何使用aioboto3&异步从S3 AWS下载文件-Python

发布于 2025-02-11 19:53:51 字数 2503 浏览 0 评论 0 原文

我有正在运行的同步脚本&运行良好,但是我看到一些下载文件需要时间,考虑了此处使用异步方法。

import json
import os
import io
import time
import gzip
import re
import logging
from logging.handlers import RotatingFileHandler
import boto3

AWS_KEY = "**"
AWS_SECRET = "**"
QUEUE_URL = "***"
OUTPUT_PATH = "./test"
VISIBILITY_TIMEOUT = 10
REGION_NAME = "region"

sqs = boto3.resource('sqs', region_name=REGION_NAME, aws_access_key_id=AWS_KEY, aws_secret_access_key=AWS_SECRET)
s3 = boto3.client('s3', region_name=REGION_NAME, aws_access_key_id=AWS_KEY, aws_secret_access_key=AWS_SECRET)

queue = sqs.Queue(url=QUEUE_URL)

def handle_response(msg, path):
    """Logic goes here"""
    print('message: %s' % msg)

def download_message_files(msg):
    for s3_file in msg['files']:
        s3_path = s3_file['path']
        with io.BytesIO() as f:
            s3.download_fileobj(msg['bucket'], s3_path, f)
            f.seek(0)
            for line in gzip.GzipFile(fileobj=f):
                await handle_response(line.decode('UTF-8'), s3_path)

def consume():
    while True:
        for msg in queue.receive_messages(VisibilityTimeout=VISIBILITY_TIMEOUT):
            body = json.loads(msg.body)  # grab the actual message body
            download_message_files(body)
            msg.delete()
            time.sleep(sleep_time)

if __name__ == '__main__':
    # Setup our root logger
    logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s %(levelname)s %(message)s")
    # Create our FDR logger
    logger = logging.getLogger("Consumer")
    # Rotate log file handler
    RFH = RotatingFileHandler("test.log", maxBytes=20971520, backupCount=5)
    # Log file output format
    F_FORMAT = logging.Formatter('%(asctime)s %(name)s %(levelname)s %(message)s')
    # Set the log file output level to INFO
    RFH.setLevel(logging.INFO)
    # Add our log file formatter to the log file handler
    RFH.setFormatter(F_FORMAT)
    # Add our log file handler to our logger
    logger.addHandler(RFH)
    consume()

我尝试使用Aioboto3进行转换,并以队列方法击中。

    session = aioboto3.Session()
    sqs = session.resource('sqs', region_name=REGION_NAME, aws_access_key_id=AWS_KEY, aws_secret_access_key=AWS_SECRET)
    s3 = session.client('s3', region_name=REGION_NAME, aws_access_key_id=AWS_KEY, aws_secret_access_key=AWS_SECRET)
queue = sqs.Queue(url=QUEUE_URL)  <---- this gives error as 'ResourceCreatorContext' object has no attribute 'Queue'

从此我可以理解的那样,没有属性,但是任何人都可以指导我与异步性质一起工作。

I have the sync script which is running & working well, but i see some download files takes time, thought of using async approach here.

import json
import os
import io
import time
import gzip
import re
import logging
from logging.handlers import RotatingFileHandler
import boto3

AWS_KEY = "**"
AWS_SECRET = "**"
QUEUE_URL = "***"
OUTPUT_PATH = "./test"
VISIBILITY_TIMEOUT = 10
REGION_NAME = "region"

sqs = boto3.resource('sqs', region_name=REGION_NAME, aws_access_key_id=AWS_KEY, aws_secret_access_key=AWS_SECRET)
s3 = boto3.client('s3', region_name=REGION_NAME, aws_access_key_id=AWS_KEY, aws_secret_access_key=AWS_SECRET)

queue = sqs.Queue(url=QUEUE_URL)

def handle_response(msg, path):
    """Logic goes here"""
    print('message: %s' % msg)

def download_message_files(msg):
    for s3_file in msg['files']:
        s3_path = s3_file['path']
        with io.BytesIO() as f:
            s3.download_fileobj(msg['bucket'], s3_path, f)
            f.seek(0)
            for line in gzip.GzipFile(fileobj=f):
                await handle_response(line.decode('UTF-8'), s3_path)

def consume():
    while True:
        for msg in queue.receive_messages(VisibilityTimeout=VISIBILITY_TIMEOUT):
            body = json.loads(msg.body)  # grab the actual message body
            download_message_files(body)
            msg.delete()
            time.sleep(sleep_time)

if __name__ == '__main__':
    # Setup our root logger
    logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s %(levelname)s %(message)s")
    # Create our FDR logger
    logger = logging.getLogger("Consumer")
    # Rotate log file handler
    RFH = RotatingFileHandler("test.log", maxBytes=20971520, backupCount=5)
    # Log file output format
    F_FORMAT = logging.Formatter('%(asctime)s %(name)s %(levelname)s %(message)s')
    # Set the log file output level to INFO
    RFH.setLevel(logging.INFO)
    # Add our log file formatter to the log file handler
    RFH.setFormatter(F_FORMAT)
    # Add our log file handler to our logger
    logger.addHandler(RFH)
    consume()

I have tried converting this using aioboto3 and got struck in queue approach.

    session = aioboto3.Session()
    sqs = session.resource('sqs', region_name=REGION_NAME, aws_access_key_id=AWS_KEY, aws_secret_access_key=AWS_SECRET)
    s3 = session.client('s3', region_name=REGION_NAME, aws_access_key_id=AWS_KEY, aws_secret_access_key=AWS_SECRET)
queue = sqs.Queue(url=QUEUE_URL)  <---- this gives error as 'ResourceCreatorContext' object has no attribute 'Queue'

As i could understand from this there is no attribute, but could anyone guide me to make this working with async nature.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

苍景流年 2025-02-18 19:53:51

您可以一起使用 asyncio aioboto3 一起使用。

您可以使用 client 而不是创建资源 aioboto3.client aioboto3.resource 可以在此

这是一个简单的工作示例:

import aioboto3

async def consume():
  async with aioboto3.Session().client(service_name='sqs', region_name=REGION_NAME, aws_access_key_id=AWS_KEY, aws_secret_access_key=AWS_SECRET) as client:
    messages = await (client.receive_messages(VisibilityTimeout=VISIBILITY_TIMEOUT)
    for message in messages:
      # Do something

这应该解决您面临的错误。该解决方案也可以是扩展根据您的要求。

You can use asyncio and aioboto3 together.

Instead of creating a resource, you can use client. The difference between an aioboto3.client and aioboto3.resource can be found in this answer.

This is a simple working example:

import aioboto3

async def consume():
  async with aioboto3.Session().client(service_name='sqs', region_name=REGION_NAME, aws_access_key_id=AWS_KEY, aws_secret_access_key=AWS_SECRET) as client:
    messages = await (client.receive_messages(VisibilityTimeout=VISIBILITY_TIMEOUT)
    for message in messages:
      # Do something

This should solve the error you are facing. This solution can also be extended to S3 as per your requirements.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文