从一个S3端点到另一个python的流副本

发布于 2025-02-08 17:48:28 字数 1107 浏览 1 评论 0原文

我的主题是将对象从一个S3端点复制到另一个端点的主题。

  • 由于某些对象很大(不仅仅是内存),因此它需要是流副本(SO PUT_OBJECT大部分,如有记录),
  • 我将从一个端点(提供者)复制到另一个端口,因此无法使用单人工具。
  • 我不能从一个人到另一个使用CopyObject的权限,
  • 我还需要使用静态映射(我拥有的)即时命名对象,
  • 不希望下载每个文件以磁盘重新普拉德,但是如果那是唯一的选项
  • SRC-Endpoint和End-Endpoint之间没有直接路径,因此此脚本在中级Linux主机上运行,​​用于复制/重命名,

我尝试了许多不同的事情,包括尝试从Web/Rest下载到流的变体上传,但我一直在目的地上以0个字节文件的形式出现。这是最接近它应该起作用的最接近的东西,但行不通。

src_client和dst_client是S3客户端会话,在2个提供商处有适当的工作信用,

import io
import boto3
data = io.BytesIO()
src_client.download_fileobj(args.SRCBUCKET, args.SRCKEY, data)
# print(data.getvalue()) shows correct data here
dst_client.upload_fileobj(data, args.DSTBUCKET, args.DSTKEY)
# at this point <data> object is closed descriptor

我也尝试过:

data = io.BytesIO()
src_client.download_fileobj(args.SRCBUCKET, args.SRCKEY, data)
# confirmed that data.getvalue() here has the right stuff
with data as chunk:
    dst_client.upload_fileobj(chunk, args.DSTBUCKET, args.DSTKEY)

在这两种情况下,结果对象看起来都这样:

2022-06-19 09:40:13 0 Methods.Dat

I have a variation of the theme of copying objects from one s3 endpoint to another.

  • Since some of the objects are quite large (more than memory), it needs to be a streaming copy (so put_object is mostly out, as documented)
  • I'm copying from one endpoint (provider) to another, so can't use a single-provider tool.
  • I can't give permissions from one to another to use CopyObject
  • I also need to rename the object on the fly using a static mapping (which I have)
  • prefer not to download each and every file to disk to reupload, but will if that's the only option
  • there's no direct path between src-endpoint and end-endpoint, so this script runs on intermediate linux host for copy/rename

I've tried a bunch of different things, including trying a variation of streaming from a web/rest download to an upload, but I keep ending up with 0 byte files on the destination. Here's the closest one that seems like it should work, but doesn't.

src_client and dst_client are s3 client sessions with appropriate working creds at 2 providers

import io
import boto3
data = io.BytesIO()
src_client.download_fileobj(args.SRCBUCKET, args.SRCKEY, data)
# print(data.getvalue()) shows correct data here
dst_client.upload_fileobj(data, args.DSTBUCKET, args.DSTKEY)
# at this point <data> object is closed descriptor

I have also tried this:

data = io.BytesIO()
src_client.download_fileobj(args.SRCBUCKET, args.SRCKEY, data)
# confirmed that data.getvalue() here has the right stuff
with data as chunk:
    dst_client.upload_fileobj(chunk, args.DSTBUCKET, args.DSTKEY)

In both cases, the resulting object looks like this:

2022-06-19 09:40:13 0 Methods.dat

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文