使用 lambda 和 boto3 从 S3 存储桶读取 csv 文件的子集
在我的 s3 存储桶中,我有大约 30 个 csv 文件,分为 3 类。对于我的 lambda,我有兴趣只选择其中属于类别 1 的 8 个。 我使用了下一个问题的响应: 读取多个 csv 文件来自带有 boto3 的 S3 存储桶,
所以我制定了下一个代码:
def read_prefix_to_df(prefix,s3_resource,bucket_name):
bucket = s3_resource.Bucket(bucket_name)
prefix_objs = bucket.objects.filter(Prefix=prefix)
prefix_df = []
for obj in prefix_objs:
key = obj.key
body = obj.get()['Body'].read()
df = pd.DataFrame(body)
prefix_df.append(df)
return prefix_df
其中:
bucket_name='my_bucket'
prefix='folder/data_overview_*.csv'
所有 8 个文件几乎具有相同的名称,除了末尾的日期,这就是为什么我使用 * 来选择与数据_概述_ 不幸的是,返回的数据框是空的,我应该更改前缀吗?
In my s3 bucket I have around 30 csv files, classified into 3 categories. With my lambda I am interested to pick only 8 of them which belong to category 1.
I had used the response from the next question: Reading multiple csv files from S3 bucket with boto3
so I formulated the next code:
def read_prefix_to_df(prefix,s3_resource,bucket_name):
bucket = s3_resource.Bucket(bucket_name)
prefix_objs = bucket.objects.filter(Prefix=prefix)
prefix_df = []
for obj in prefix_objs:
key = obj.key
body = obj.get()['Body'].read()
df = pd.DataFrame(body)
prefix_df.append(df)
return prefix_df
Where :
bucket_name='my_bucket'
prefix='folder/data_overview_*.csv'
all the 8 files have almost the same name except the date at the end that's why I used the * to pick all files related to data_overview_
Unfortunately, the returned dataframe was empty, shall I change the prefix?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
前缀不能包含通配符。
您应该使用:
如果您需要进一步限制为仅 CSV 文件,那么您将需要在 Python 代码中使用
if
语句来完成此操作。Prefixes cannot contain wildcard characters.
You should use:
If you need to further limit to only CSV files, then you will need to do that with an
if
statement within your Python code.