了解AWS S3中的延续令牌
我想更好地了解持续令牌如何在list_objects_v2()中工作。这是一件通过大型S3桶进行迭代的代码,存储提供的延续令牌:
def transformer():
# some s3 client
response = S3C.list_objects_v2(Bucket=BUCKET_NAME)
tokens = []
while True:
if "NextContinuationToken" in response:
token = response["NextContinuationToken"]
tokens.append(token)
response = S3C.list_objects_v2(Bucket=BUCKET_NAME, ContinuationToken=token)
else:
break
print(tokens)
这些令牌后面的这些令牌的结构是什么?我注意到,如果我重新运行它们的功能,它们会重新生成(不一样)。此外,我还将如何抓取指示第一个API调用的起点的令牌? 我理解这一点的动机是在平行计算的背景下 - 查看我是否无法抓住这些令牌,然后将它们作为计算指数的某个地方运送出来,并获得强大的结果。我有点菜鸟,所以谢谢你的耐心:)
I'd like to understand better how continuation tokens work in list_objects_v2(). Here is a piece of code that iterates through a large S3 bucket, storing the continuation tokens provided:
def transformer():
# some s3 client
response = S3C.list_objects_v2(Bucket=BUCKET_NAME)
tokens = []
while True:
if "NextContinuationToken" in response:
token = response["NextContinuationToken"]
tokens.append(token)
response = S3C.list_objects_v2(Bucket=BUCKET_NAME, ContinuationToken=token)
else:
break
print(tokens)
What is the structure of these tokens behind the hood? I noticed if i rerun the function they are re-generated (not the same.) Also: how would I grab the token indicating the starting point for the first API call?
My motivation for understanding this is in the context of parallel computations - seeing if i can't grab these tokens and then ship them out somewhere as indices for computation and get a robust result. I'm a bit of a noob so thanks for being patient :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不幸的是,这是不可能的。 S3列表操作是100%顺序的,即您无法平行。
顺便说一句,您仍然可以做到这一点,以防深处目录树中的列表对象。尝试在目录树中列出一个或两个(或任何)级别的一个或两个(或任何)级别。并将收到的每条路径作为另一个列表请求的基础。
对于前。
depth = 1 的第一个列表RQ将为您提供两个键,/f1 和/f2
然后,您可以列出它们中的每个以并行处理对象。
希望这有帮助!
Unfortunately it is not possible. S3 list operation is 100% sequential, i.e. you cannot parallel it.
BTW you still can do the trick, in case you need list objects in deep directory tree. Try to list one, or two (or any) levels deep in directory tree. And use each path received as base for another list request.
For ex.
First list rq with depth=1 will give you two keys, /f1 and /f2
And then you can list each of them to process objects in parallel.
Hope this helps!