从一个S3端点到另一个python的流副本
我的主题是将对象从一个S3端点复制到另一个端点的主题。
- 由于某些对象很大(不仅仅是内存),因此它需要是流副本(SO PUT_OBJECT大部分,如有记录),
- 我将从一个端点(提供者)复制到另一个端口,因此无法使用单人工具。
- 我不能从一个人到另一个使用CopyObject的权限,
- 我还需要使用静态映射(我拥有的)即时命名对象,
- 不希望下载每个文件以磁盘重新普拉德,但是如果那是唯一的选项
- SRC-Endpoint和End-Endpoint之间没有直接路径,因此此脚本在中级Linux主机上运行,用于复制/重命名,
我尝试了许多不同的事情,包括尝试从Web/Rest下载到流的变体上传,但我一直在目的地上以0个字节文件的形式出现。这是最接近它应该起作用的最接近的东西,但行不通。
src_client和dst_client是S3客户端会话,在2个提供商处有适当的工作信用,
import io
import boto3
data = io.BytesIO()
src_client.download_fileobj(args.SRCBUCKET, args.SRCKEY, data)
# print(data.getvalue()) shows correct data here
dst_client.upload_fileobj(data, args.DSTBUCKET, args.DSTKEY)
# at this point <data> object is closed descriptor
我也尝试过:
data = io.BytesIO()
src_client.download_fileobj(args.SRCBUCKET, args.SRCKEY, data)
# confirmed that data.getvalue() here has the right stuff
with data as chunk:
dst_client.upload_fileobj(chunk, args.DSTBUCKET, args.DSTKEY)
在这两种情况下,结果对象看起来都这样:
2022-06-19 09:40:13 0 Methods.Dat
I have a variation of the theme of copying objects from one s3 endpoint to another.
- Since some of the objects are quite large (more than memory), it needs to be a streaming copy (so put_object is mostly out, as documented)
- I'm copying from one endpoint (provider) to another, so can't use a single-provider tool.
- I can't give permissions from one to another to use CopyObject
- I also need to rename the object on the fly using a static mapping (which I have)
- prefer not to download each and every file to disk to reupload, but will if that's the only option
- there's no direct path between src-endpoint and end-endpoint, so this script runs on intermediate linux host for copy/rename
I've tried a bunch of different things, including trying a variation of streaming from a web/rest download to an upload, but I keep ending up with 0 byte files on the destination. Here's the closest one that seems like it should work, but doesn't.
src_client and dst_client are s3 client sessions with appropriate working creds at 2 providers
import io
import boto3
data = io.BytesIO()
src_client.download_fileobj(args.SRCBUCKET, args.SRCKEY, data)
# print(data.getvalue()) shows correct data here
dst_client.upload_fileobj(data, args.DSTBUCKET, args.DSTKEY)
# at this point <data> object is closed descriptor
I have also tried this:
data = io.BytesIO()
src_client.download_fileobj(args.SRCBUCKET, args.SRCKEY, data)
# confirmed that data.getvalue() here has the right stuff
with data as chunk:
dst_client.upload_fileobj(chunk, args.DSTBUCKET, args.DSTKEY)
In both cases, the resulting object looks like this:
2022-06-19 09:40:13 0 Methods.dat
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论