使用 python 从 s3 存储桶内文件夹而不是文件夹 --folder 内下载最新文件

发布于 2025-01-11 17:35:20 字数 1517 浏览 0 评论 0原文

我只想从 s3-bucket 文件夹内下载最新文件。实际上,文件夹内有多个文件夹和文件。但我需要仅下载最新日期的文件,然后通过从多个文件夹中进行选择将其上传到一个文件夹中。我引用的是 stackoverflow 源代码中的代码。

这是 s3-bucket 的结构:

  S3-Bucket : --folder_1
                  --abc2022.01.29.csv
                  --bsv2022.02.18.csv
                  --test2022.03.04.csv
                  --Folder_12
                  --Folder_13
                  --folder_14

所以基本上,我想从 s3-bucket 文件夹(folder_1)内下载最新文件,而不是从文件夹文件夹(Folder_12、Folder_13、Folder_14)内下载最新文件。

我收到以下错误:

TypeError: 'NoneType' object is not subscriptable

这是用于下载最新文件的代码片段:

  def get_most_recent_s3_object(bucket_name, prefix)

       s3 = session.client('s3')
       paginator = s3.get_paginator( "list_objects_v2" )
       page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix, Delimiter="/")
       latest = None
       for page in page_iterator:
           if "Contents" in page:
               latest2 = max(page['Contents'], key=lambda x: x['LastModified'])
               if latest is None or latest2['LastModified'] > latest['LastModified']:
                    latest = latest2
                    with open(latest, 'wb') as f:
                         s3.download_fileobj(bucket_name, latest, 'C:\\Users\xxxx\\)
      return latest
      

  latest = get_most_recent_s3_object(bucket_name='bucket_name_1', prefix='folder_1')
  print(latest['Key'])

但我无法将其下载到我的本地路径中。该代码从文件夹内的文件夹而不是从 s3-bucket 内的文件夹 (folder_1) 获取最新文件。

I want to download the latest file from s3-bucket inside folder only. Actually inside folder there are multiple folders along with files. But i need to download only file of latest date and upload it into one folder by selecting from multiple folders. I am referring the code from stackoverflow source code.

Here is structure of s3-bucket :

  S3-Bucket : --folder_1
                  --abc2022.01.29.csv
                  --bsv2022.02.18.csv
                  --test2022.03.04.csv
                  --Folder_12
                  --Folder_13
                  --folder_14

So basically, I want to download latest file from s3-bucket inside folder (folder_1) not from inside folder folders (Folder_12,Folder_13,Folder_14).

I am getting the below error :

TypeError: 'NoneType' object is not subscriptable

Here is the code snippet using to download the latest file :

  def get_most_recent_s3_object(bucket_name, prefix)

       s3 = session.client('s3')
       paginator = s3.get_paginator( "list_objects_v2" )
       page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix, Delimiter="/")
       latest = None
       for page in page_iterator:
           if "Contents" in page:
               latest2 = max(page['Contents'], key=lambda x: x['LastModified'])
               if latest is None or latest2['LastModified'] > latest['LastModified']:
                    latest = latest2
                    with open(latest, 'wb') as f:
                         s3.download_fileobj(bucket_name, latest, 'C:\\Users\xxxx\\)
      return latest
      

  latest = get_most_recent_s3_object(bucket_name='bucket_name_1', prefix='folder_1')
  print(latest['Key'])

But I'm not able to download the into my local path. the code is getting latest file from folders inside folders not from the s3-bucket inside folder (folder_1).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

沦落红尘 2025-01-18 17:35:20

我修改了以下代码以下载 s3-bucket 文件夹内的最新文件,并且工作正常。请找到下面的工作代码片段。

def get_most_recent_s3_object(bucket_name, prefix)

   s3 = session.client('s3')
   paginator = s3.get_paginator( "list_objects_v2" )
   page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix, Delimiter="/")
   latest = None
   for page in page_iterator:
       if "Contents" in page:
           latest2 = max(page['Contents'], key=lambda x: x['LastModified'])
           if latest is None or latest2['LastModified'] > latest['LastModified']:
                latest = latest2.get('Key')
                with open(C:\\Users\xxxx\\dummy.csv', 'wb') as f:
                     s3.download_fileobj(bucket_name, latest, f)
                print('Latest file downloaded successfully....!!!')
  
  

  latest = get_most_recent_s3_object(bucket_name='bucket_name_1', prefix='folder_1/')

I have modified the below code to download the latest file in s3-bucket inside folder and it's working fine. Please find the below working code snippet.

def get_most_recent_s3_object(bucket_name, prefix)

   s3 = session.client('s3')
   paginator = s3.get_paginator( "list_objects_v2" )
   page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix, Delimiter="/")
   latest = None
   for page in page_iterator:
       if "Contents" in page:
           latest2 = max(page['Contents'], key=lambda x: x['LastModified'])
           if latest is None or latest2['LastModified'] > latest['LastModified']:
                latest = latest2.get('Key')
                with open(C:\\Users\xxxx\\dummy.csv', 'wb') as f:
                     s3.download_fileobj(bucket_name, latest, f)
                print('Latest file downloaded successfully....!!!')
  
  

  latest = get_most_recent_s3_object(bucket_name='bucket_name_1', prefix='folder_1/')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文