列出 Amazon S3 中给定级别的目录

发布于 2024-07-30 12:39:11 字数 795 浏览 1 评论 0原文

我在亚马逊 S3 存储桶中存储了 200 万个文件。下面有一个给定的根（l1），l1 下的目录列表，然后每个目录包含文件。所以我的存储桶将如下所示

l1/a1/file1-1.jpg
l1/a1/file1-2.jpg
l1/a1/... another 500 files
l1/a2/file2-1.jpg
l1/a2/file2-2.jpg
l1/a2/... another 500 files
....

l1/a5000/file5000-1.jpg

我想尽快列出第二级条目，所以我想获得 a1、a2、a5000。我不想列出所有密钥，这会花费更长的时间。

我愿意直接使用 AWS api，但是到目前为止，我已经在 ruby 中使用了 right_aws gem http: //rdoc.info/projects/rightscale/right_aws

该gem中至少有两个API，我尝试在S3模块中使用bucket.keys()，在S3Interface模块中使用incrementally_list_bucket()。例如，我可以设置前缀和分隔符来列出所有 l1/a1/*，但我无法弄清楚如何仅列出 l1 中的第一个级别。由incrementally_list_bucket()返回的散列中有一个:common_prefixes条目，但在我的测试示例中它没有被填充。S3

API是否可以执行此操作？

谢谢！

原文

I am storing two million files in an amazon S3 bucket. There is a given root (l1) below, a list of directories under l1 and then each directory contains files. So my bucket will look something like the following

l1/a1/file1-1.jpg
l1/a1/file1-2.jpg
l1/a1/... another 500 files
l1/a2/file2-1.jpg
l1/a2/file2-2.jpg
l1/a2/... another 500 files
....

l1/a5000/file5000-1.jpg

I would like to list as fast as possible the second level entries, so I would like to get a1, a2, a5000. I do not want to list all the keys, this will take a lot longer.

I am open to using directly the AWS api, however I have played so far with the right_aws gem in ruby http://rdoc.info/projects/rightscale/right_aws

There are at least two APIs in that gem, I tried using bucket.keys() in the S3 module and incrementally_list_bucket() in the S3Interface module. I can set the prefix and delimiter to list all of l1/a1/*, for example, but I cannot figure out how to list just the first level in l1. There is a :common_prefixes entry in the hash returned by incrementally_list_bucket() but in my test sample it is not filled in.

Is this operation possible with the S3 API?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

网名女生简单气质 2024-08-06 12:39:11

right_aws 允许将其作为其底层 S3Interface 类的一部分来执行此操作，但您可以创建自己的方法以更轻松（更好）地使用。将其放在代码顶部：

module RightAws
  class S3
    class Bucket
      def common_prefixes(prefix, delimiter = '/')
        common_prefixes = []
        @s3.interface.incrementally_list_bucket(@name, { 'prefix' => prefix, 'delimiter' => delimiter }) do |thislist|          
          common_prefixes += thislist[:common_prefixes]
        end
        common_prefixes
      end
    end
  end
end

这会将 common_prefixes 方法添加到 RightAws::S3::Bucket 类中。现在，您可以使用 mybucket.common_prefixes 获取公共前缀数组，而不是调用 mybucket.keys 来获取存储桶中的键列表。就你的情况而言：

mybucket.common_prefixes("l1/")
# => ["l1/a1", "l1/a2", ... "l1/a5000"]

我必须说我只使用少量常见前缀对其进行了测试；您应该检查这是否适用于 1000 多个常见前缀。

right_aws allows to do this as part of their underlying S3Interface class, but you can create your own method for an easier (and nicer) use. Put this at the top of your code:

module RightAws
  class S3
    class Bucket
      def common_prefixes(prefix, delimiter = '/')
        common_prefixes = []
        @s3.interface.incrementally_list_bucket(@name, { 'prefix' => prefix, 'delimiter' => delimiter }) do |thislist|          
          common_prefixes += thislist[:common_prefixes]
        end
        common_prefixes
      end
    end
  end
end

This adds the common_prefixes method to the RightAws::S3::Bucket class. Now, instead of calling mybucket.keys to fetch the list of keys in your bucket, you can use mybucket.common_prefixes to get an array of common prefixes. In your case:

mybucket.common_prefixes("l1/")
# => ["l1/a1", "l1/a2", ... "l1/a5000"]

I must say I tested it only with a small number of common prefixes; you should check that this works with more than 1000 common prefixes.

回复收藏 0 原文

我要还你自由 2024-08-06 12:39:11

这个帖子已经很老了，但我最近确实遇到了这个问题，并想声明我的 2 美分...

（看起来）要干净地列出 S3 存储桶中给定路径的文件夹是一件麻烦事。当前围绕 S3 API（AWS-SDK 官方，S3）的大多数 gem 包装器都无法正确解析返回对象（特别是 CommonPrefixes），因此很难取回文件夹列表（分隔符噩梦）。

这是针对那些使用 S3 gem 的人的快速修复...抱歉，它不是万能的，但这是我想做的最好的。

https://github.com/qoobaa/s3/issues/61

代码片段：

module S3
  class Bucket
    # this method recurses if the response coming back
    # from S3 includes a truncation flag (IsTruncated == 'true')
    # then parses the combined response(s) XML body
    # for CommonPrefixes/Prefix AKA directories
    def directory_list(options = {}, responses = [])
      options = {:delimiter => "/"}.merge(options)
      response = bucket_request(:get, :params => options)

      if is_truncated?(response.body)
        directory_list(options.merge({:marker => next_marker(response.body)}), responses << response.body)
      else
        parse_xml_array(responses + [response.body], options)
      end
    end

    private

    def parse_xml_array(xml_array, options = {}, clean_path = true)
      names = []
      xml_array.each do |xml|
        rexml_document(xml).elements.each("ListBucketResult/CommonPrefixes/Prefix") do |e|
          if clean_path
            names << e.text.gsub((options[:prefix] || ''), '').gsub((options[:delimiter] || ''), '')
          else
            names << e.text
          end
        end
      end
      names
    end

    def next_marker(xml)
      marker = nil
      rexml_document(xml).elements.each("ListBucketResult/NextMarker") {|e| marker ||= e.text }
      if marker.nil?
        raise StandardError
      else
        marker
      end
    end

    def is_truncated?(xml)
      is_truncated = nil
      rexml_document(xml).elements.each("ListBucketResult/IsTruncated") {|e| is_truncated ||= e.text }
      is_truncated == 'true'
    end
  end
end

This thread is quite old but I did run into this issue recently and wanted to assert my 2cents...

It is a hassle and a half (it seems) to cleanly list out folders given a path in an S3 bucket. Most of the current gem wrappers around the S3 API (AWS-SDK official, S3) don't correctly parse the return object (specifically the CommonPrefixes) so it is difficult to get back a list of folders (delimiter nightmares).

Here is a quick fix for those using the S3 gem... Sorry it isn't one size fits all but it's the best I wanted to do.

https://github.com/qoobaa/s3/issues/61

Code snippet:

module S3
  class Bucket
    # this method recurses if the response coming back
    # from S3 includes a truncation flag (IsTruncated == 'true')
    # then parses the combined response(s) XML body
    # for CommonPrefixes/Prefix AKA directories
    def directory_list(options = {}, responses = [])
      options = {:delimiter => "/"}.merge(options)
      response = bucket_request(:get, :params => options)

      if is_truncated?(response.body)
        directory_list(options.merge({:marker => next_marker(response.body)}), responses << response.body)
      else
        parse_xml_array(responses + [response.body], options)
      end
    end

    private

    def parse_xml_array(xml_array, options = {}, clean_path = true)
      names = []
      xml_array.each do |xml|
        rexml_document(xml).elements.each("ListBucketResult/CommonPrefixes/Prefix") do |e|
          if clean_path
            names << e.text.gsub((options[:prefix] || ''), '').gsub((options[:delimiter] || ''), '')
          else
            names << e.text
          end
        end
      end
      names
    end

    def next_marker(xml)
      marker = nil
      rexml_document(xml).elements.each("ListBucketResult/NextMarker") {|e| marker ||= e.text }
      if marker.nil?
        raise StandardError
      else
        marker
      end
    end

    def is_truncated?(xml)
      is_truncated = nil
      rexml_document(xml).elements.each("ListBucketResult/IsTruncated") {|e| is_truncated ||= e.text }
      is_truncated == 'true'
    end
  end
end

回复收藏 0 原文

~没有更多了~