Python 使用 boto3 一次处理 1 个文件

发布于 2025-01-10 13:28:48 字数 1724 浏览 0 评论 0原文

我有一个 Python 脚本需要读取和处理 100 个文件。我的脚本如何在一次仅处理 1 个文件的情况下运行?我的 print(len(session_results)) 应该有 2 个单独的长度。

def fetch_endpoint(endpoint: str, date: str) -> str:
    """Read files from s3 bucket and aggregate them via generator function"""
    client = aws_s3_client_auth()
    resource = aws_s3_resource_auth()
    print(f"{endpoint.upper()}/{date}/")
    paginator = client.get_paginator('list_objects_v2')
    operation_parameters: dict[str] = {
        "Bucket": RAW_BUCKET,
        "Prefix": f"{endpoint.upper()}/{date}/"
    }
    keys_to_process: list[str] = []
    page_iterator = paginator.paginate(**operation_parameters, PaginationConfig={'MaxItems': 100})
    for page in page_iterator:
        if page.get("KeyCount") == 0:
            return print(f'No data was found for {endpoint} on {date}')
        else:
            for content in page.get("Contents"):
                # print(content)
                keys_to_process.append(content.get("Key"))
            print(keys_to_process)
            index = 0
            while index < len(keys_to_process):
                print(keys_to_process[index])
                client_events = resource.Object(RAW_BUCKET, key=keys_to_process[index])
                file_content = client_events.get()['Body'].read().decode('utf-8')
                for line in file_content.splitlines():
                    data = json.loads(line)
                    index += 1
                    yield data


def session(endpoint: str, date: str) -> None:
    """Make a request to s3 to read session in the RAW_BUCKET"""
    session_results = [line for line in fetch_endpoint(endpoint, date)]
    print(len(session_results))
    return None

I have a Python script that needs to read and process 100 files. How can my script run while only processing 1 file at a time? My print(len(session_results)) should have 2 separate lengths.

def fetch_endpoint(endpoint: str, date: str) -> str:
    """Read files from s3 bucket and aggregate them via generator function"""
    client = aws_s3_client_auth()
    resource = aws_s3_resource_auth()
    print(f"{endpoint.upper()}/{date}/")
    paginator = client.get_paginator('list_objects_v2')
    operation_parameters: dict[str] = {
        "Bucket": RAW_BUCKET,
        "Prefix": f"{endpoint.upper()}/{date}/"
    }
    keys_to_process: list[str] = []
    page_iterator = paginator.paginate(**operation_parameters, PaginationConfig={'MaxItems': 100})
    for page in page_iterator:
        if page.get("KeyCount") == 0:
            return print(f'No data was found for {endpoint} on {date}')
        else:
            for content in page.get("Contents"):
                # print(content)
                keys_to_process.append(content.get("Key"))
            print(keys_to_process)
            index = 0
            while index < len(keys_to_process):
                print(keys_to_process[index])
                client_events = resource.Object(RAW_BUCKET, key=keys_to_process[index])
                file_content = client_events.get()['Body'].read().decode('utf-8')
                for line in file_content.splitlines():
                    data = json.loads(line)
                    index += 1
                    yield data


def session(endpoint: str, date: str) -> None:
    """Make a request to s3 to read session in the RAW_BUCKET"""
    session_results = [line for line in fetch_endpoint(endpoint, date)]
    print(len(session_results))
    return None

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

鯉魚旗 2025-01-17 13:28:48

session_results 是一个列表,其中列表中的每个元素都是由 fetch_endpoint() 生成的内容。

当您 print(len(session_results)) 时,您是在告诉 Python 告诉您 session_results 中有多少个元素。

我不太确定你所说的“应该有两个单独的长度”是什么意思。它是一个列表,一次只能有 1 个长度。

session_results is a list where each element in the list is something that was yielded by fetch_endpoint().

When you print(len(session_results)) you are telling Python to tell you how many elements are in session_results.

I'm not really sure what you mean by "should have 2 separate lengths." It is a list, it can only have 1 length at a time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文