列出 Amazon S3 中给定级别的目录
我在亚马逊 S3 存储桶中存储了 200 万个文件。 下面有一个给定的根(l1),l1 下的目录列表,然后每个目录包含文件。 所以我的存储桶将如下所示
l1/a1/file1-1.jpg
l1/a1/file1-2.jpg
l1/a1/... another 500 files
l1/a2/file2-1.jpg
l1/a2/file2-2.jpg
l1/a2/... another 500 files
....
l1/a5000/file5000-1.jpg
我想尽快列出第二级条目,所以我想获得 a1、a2、a5000。 我不想列出所有密钥,这会花费更长的时间。
我愿意直接使用 AWS api,但是到目前为止,我已经在 ruby 中使用了 right_aws gem http: //rdoc.info/projects/rightscale/right_aws
该gem中至少有两个API,我尝试在S3模块中使用bucket.keys(),在S3Interface模块中使用incrementally_list_bucket()。 例如,我可以设置前缀和分隔符来列出所有 l1/a1/*,但我无法弄清楚如何仅列出 l1 中的第一个级别。 由incrementally_list_bucket()返回的散列中有一个:common_prefixes条目,但在我的测试示例中它没有被填充。S3
API是否可以执行此操作?
谢谢!
I am storing two million files in an amazon S3 bucket. There is a given root (l1) below, a list of directories under l1 and then each directory contains files. So my bucket will look something like the following
l1/a1/file1-1.jpg
l1/a1/file1-2.jpg
l1/a1/... another 500 files
l1/a2/file2-1.jpg
l1/a2/file2-2.jpg
l1/a2/... another 500 files
....
l1/a5000/file5000-1.jpg
I would like to list as fast as possible the second level entries, so I would like to get a1, a2, a5000. I do not want to list all the keys, this will take a lot longer.
I am open to using directly the AWS api, however I have played so far with the right_aws gem in ruby http://rdoc.info/projects/rightscale/right_aws
There are at least two APIs in that gem, I tried using bucket.keys() in the S3 module and incrementally_list_bucket() in the S3Interface module. I can set the prefix and delimiter to list all of l1/a1/*, for example, but I cannot figure out how to list just the first level in l1. There is a :common_prefixes entry in the hash returned by incrementally_list_bucket() but in my test sample it is not filled in.
Is this operation possible with the S3 API?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
right_aws
允许将其作为其底层S3Interface
类的一部分来执行此操作,但您可以创建自己的方法以更轻松(更好)地使用。 将其放在代码顶部:这会将
common_prefixes
方法添加到RightAws::S3::Bucket
类中。 现在,您可以使用mybucket.common_prefixes
获取公共前缀数组,而不是调用mybucket.keys
来获取存储桶中的键列表。 就你的情况而言:我必须说我只使用少量常见前缀对其进行了测试; 您应该检查这是否适用于 1000 多个常见前缀。
right_aws
allows to do this as part of their underlyingS3Interface
class, but you can create your own method for an easier (and nicer) use. Put this at the top of your code:This adds the
common_prefixes
method to theRightAws::S3::Bucket
class. Now, instead of callingmybucket.keys
to fetch the list of keys in your bucket, you can usemybucket.common_prefixes
to get an array of common prefixes. In your case:I must say I tested it only with a small number of common prefixes; you should check that this works with more than 1000 common prefixes.
这个帖子已经很老了,但我最近确实遇到了这个问题,并想声明我的 2 美分...
(看起来)要干净地列出 S3 存储桶中给定路径的文件夹是一件麻烦事。 当前围绕 S3 API(AWS-SDK 官方,S3)的大多数 gem 包装器都无法正确解析返回对象(特别是 CommonPrefixes),因此很难取回文件夹列表(分隔符噩梦)。
这是针对那些使用 S3 gem 的人的快速修复...抱歉,它不是万能的,但这是我想做的最好的。
https://github.com/qoobaa/s3/issues/61
代码片段:
This thread is quite old but I did run into this issue recently and wanted to assert my 2cents...
It is a hassle and a half (it seems) to cleanly list out folders given a path in an S3 bucket. Most of the current gem wrappers around the S3 API (AWS-SDK official, S3) don't correctly parse the return object (specifically the CommonPrefixes) so it is difficult to get back a list of folders (delimiter nightmares).
Here is a quick fix for those using the S3 gem... Sorry it isn't one size fits all but it's the best I wanted to do.
https://github.com/qoobaa/s3/issues/61
Code snippet: