如何通过通配符匹配从S3选择文件
如何基于与文件名匹配的通配符识别/选择特定文件?
我想根据 Wildcard 文件名匹配的S3上存在的文件,由模式给出:dwh_cust_p665 _*。强>匹配文件的文件名。 例如 -
- 文件“ dwh_cust_p665_20220515_170922.xml”应为选择基于 在上面的通配符上
- ,但文件“ dwh_prod_p223_20220607_102314.xml”应忽略。
我需要返回与上面指定的通配符模式匹配的文件的名称。我提出了以下代码段。但是,我正在努力使它起作用。任何人都可以帮助正确匹配的模式。
import boto3
import os
import re
import xml.etree.ElementTree as ET
class S3Pull:
def __init__(self, bucket):
self.bucket = bucket
self.client = boto3.client('s3')
def iterate_bucket(self, wildcard=".*"):
client = boto3.client('s3')
paginator = client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=self.bucket)
regex = re.compile(wildcard)
for page in page_iterator:
if page['KeyCount'] > 0:
for item in page['Contents']:
if re.match(regex, item["Key"])
print(item['Key'])
return str(item['Key'])
if __name__ == "__main__":
TYPE = "CUST"
ID = "P665"
bucket = "dwh-landing-bucket"
s3_pull = S3Pull(bucket)
s3_object = s3_pull.iterate_bucket(f"DWH_{TYPE}_{ID}_*.xml")
s3 = boto3.client('s3')
obj = s.get_object(Bucket=bucket, Key=s3_object)
tree = ET.parse(obj['Body'])
我似乎无法使通配符模式匹配正常工作并返回匹配的文件名。
任何帮助将不胜感激。
很高兴提供更多信息。
How to identify/select specific file from S3 based on wildcard matching the filename ?
I want to select the file present on S3 based on wildcard pattern matching of the filename, given by the pattern: DWH_CUST_P665_*.xml
and return the full filename of the matched file.
E.g -
- The file "DWH_CUST_P665_20220515_170922.xml" should be selected based
on the wildcard above - But the file "DWH_PROD_P223_20220607_102314.xml" should be ignored.
I need to return the name of the file that matches the wildcard pattern specified above. I have come up with the following code snippet. However, I am struggling to make it work. Can anyone please help to do the pattern matching correctly.
import boto3
import os
import re
import xml.etree.ElementTree as ET
class S3Pull:
def __init__(self, bucket):
self.bucket = bucket
self.client = boto3.client('s3')
def iterate_bucket(self, wildcard=".*"):
client = boto3.client('s3')
paginator = client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=self.bucket)
regex = re.compile(wildcard)
for page in page_iterator:
if page['KeyCount'] > 0:
for item in page['Contents']:
if re.match(regex, item["Key"])
print(item['Key'])
return str(item['Key'])
if __name__ == "__main__":
TYPE = "CUST"
ID = "P665"
bucket = "dwh-landing-bucket"
s3_pull = S3Pull(bucket)
s3_object = s3_pull.iterate_bucket(f"DWH_{TYPE}_{ID}_*.xml")
s3 = boto3.client('s3')
obj = s.get_object(Bucket=bucket, Key=s3_object)
tree = ET.parse(obj['Body'])
I can't seem to get the wildcard pattern match to work correctly and return the matching filename.
Any help is appreciated.
Happy to provide more info.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论