如何通过通配符匹配从S3选择文件

发布于 2025-02-12 20:21:34 字数 1454 浏览 1 评论 0原文

如何基于与文件名匹配的通配符识别/选择特定文件?

我想根据 Wildcard 文件名匹配的S3上存在的文件,由模式给出:dwh_cust_p665 _*。强>匹配文件的文件名。 例如 -

  • 文件“ dwh_cust_p665_20220515_170922.xml”应为选择基于 在上面的通配符上
  • ,但文件“ dwh_prod_p223_20220607_102314.xml”应忽略

我需要返回与上面指定的通配符模式匹配的文件的名称。我提出了以下代码段。但是,我正在努力使它起作用。任何人都可以帮助正确匹配的模式。

import boto3
import os
import re
import xml.etree.ElementTree as ET

class S3Pull:
    def __init__(self, bucket):
        self.bucket = bucket
        self.client = boto3.client('s3')

    def iterate_bucket(self, wildcard=".*"):
        client = boto3.client('s3')
        paginator = client.get_paginator('list_objects_v2')
        page_iterator = paginator.paginate(Bucket=self.bucket)

        regex = re.compile(wildcard)
        for page in page_iterator:
            if page['KeyCount'] > 0:
                for item in page['Contents']:
                    if re.match(regex, item["Key"])
                       print(item['Key'])
    
    return str(item['Key'])

if __name__ == "__main__":
    TYPE = "CUST"
    ID = "P665"
    bucket = "dwh-landing-bucket"
    s3_pull = S3Pull(bucket)
    s3_object = s3_pull.iterate_bucket(f"DWH_{TYPE}_{ID}_*.xml")

    s3 = boto3.client('s3')
    obj = s.get_object(Bucket=bucket, Key=s3_object)

    tree = ET.parse(obj['Body'])

我似乎无法使通配符模式匹配正常工作并返回匹配的文件名。

任何帮助将不胜感激。

很高兴提供更多信息。

How to identify/select specific file from S3 based on wildcard matching the filename ?

I want to select the file present on S3 based on wildcard pattern matching of the filename, given by the pattern: DWH_CUST_P665_*.xml and return the full filename of the matched file.
E.g -

  • The file "DWH_CUST_P665_20220515_170922.xml" should be selected based
    on the wildcard above
  • But the file "DWH_PROD_P223_20220607_102314.xml" should be ignored.

I need to return the name of the file that matches the wildcard pattern specified above. I have come up with the following code snippet. However, I am struggling to make it work. Can anyone please help to do the pattern matching correctly.

import boto3
import os
import re
import xml.etree.ElementTree as ET

class S3Pull:
    def __init__(self, bucket):
        self.bucket = bucket
        self.client = boto3.client('s3')

    def iterate_bucket(self, wildcard=".*"):
        client = boto3.client('s3')
        paginator = client.get_paginator('list_objects_v2')
        page_iterator = paginator.paginate(Bucket=self.bucket)

        regex = re.compile(wildcard)
        for page in page_iterator:
            if page['KeyCount'] > 0:
                for item in page['Contents']:
                    if re.match(regex, item["Key"])
                       print(item['Key'])
    
    return str(item['Key'])

if __name__ == "__main__":
    TYPE = "CUST"
    ID = "P665"
    bucket = "dwh-landing-bucket"
    s3_pull = S3Pull(bucket)
    s3_object = s3_pull.iterate_bucket(f"DWH_{TYPE}_{ID}_*.xml")

    s3 = boto3.client('s3')
    obj = s.get_object(Bucket=bucket, Key=s3_object)

    tree = ET.parse(obj['Body'])

I can't seem to get the wildcard pattern match to work correctly and return the matching filename.

Any help is appreciated.

Happy to provide more info.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文