我可以将没有内容长度标头的文件上传到 S3 吗?
我正在一台内存有限的机器上工作,我想以流式传输方式将动态生成的(非磁盘)文件上传到 S3。换句话说,当我开始上传时我不知道文件大小,但到最后我就会知道。通常,PUT 请求具有 Content-Length 标头,但也许有办法解决此问题,例如使用多部分或分块内容类型。
S3可以支持流式上传。例如,请参见此处:
http://blog.odonnell.nu/posts/streaming -uploads-s3-python-and-poster/
我的问题是,我可以完成同样的事情而不必在上传开始时指定文件长度吗?
I'm working on a machine with limited memory, and I'd like to upload a dynamically generated (not-from-disk) file in a streaming manner to S3. In other words, I don't know the file size when I start the upload, but I'll know it by the end. Normally a PUT request has a Content-Length header, but perhaps there is a way around this, such as using multipart or chunked content-type.
S3 can support streaming uploads. For example, see here:
http://blog.odonnell.nu/posts/streaming-uploads-s3-python-and-poster/
My question is, can I accomplish the same thing without having to specify the file length at the start of the upload?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您必须通过 S3 的多部分 API 以 5MiB+ 块上传文件。每个块都需要一个内容长度,但您可以避免将大量数据 (100MiB+) 加载到内存中。
S3 允许最多 10,000 个零件。因此,通过选择 5MiB 的部分大小,您将能够上传高达 50GiB 的动态文件。对于大多数用例来说应该足够了。
但是:如果您需要更多,则必须增加零件尺寸。通过使用更大的部分大小(例如 10MiB)或在上传过程中增加它。
这将允许您上传高达 1TB 的文件(S3 目前对单个文件的限制为 5TB),而不会不必要地浪费内存。
关于Sean O'Donnells 博客链接的注释:
他的问题与你的不同 - 他在上传之前知道并使用内容长度。他希望改进这种情况:许多库通过将文件中的所有数据加载到内存中来处理上传。在伪代码中,这将是这样的:
他的解决方案通过文件系统 API 获取
Content-Length
来实现。然后,他将数据从磁盘流式传输到请求流中。在伪代码中:You have to upload your file in 5MiB+ chunks via S3's multipart API. Each of those chunks requires a Content-Length but you can avoid loading huge amounts of data (100MiB+) into memory.
S3 allows up to 10,000 parts. So by choosing a part-size of 5MiB you will be able to upload dynamic files of up to 50GiB. Should be enough for most use-cases.
However: If you need more, you have to increase your part-size. Either by using a higher part-size (10MiB for example) or by increasing it during the upload.
This will allow you to upload files of up to 1TB (S3's limit for a single file is 5TB right now) without wasting memory unnecessarily.
A note on your link to Sean O'Donnells blog:
His problem is different from yours - he knows and uses the Content-Length before the upload. He wants to improve on this situation: Many libraries handle uploads by loading all data from a file into memory. In pseudo-code that would be something like this:
His solution does it by getting the
Content-Length
via the filesystem-API. He then streams the data from disk into the request-stream. In pseudo-code:将此答案放在这里供其他人使用,以防有帮助:
如果您不知道流式传输到 S3 的数据的长度,您可以使用
S3FileInfo
及其OpenWrite() 方法将任意数据写入S3。
Putting this answer here for others in case it helps:
If you don't know the length of the data you are streaming up to S3, you can use
S3FileInfo
and itsOpenWrite()
method to write arbitrary data into S3.您可以使用 gof3r 命令行工具来传输 Linux 管道:
You can use the gof3r command-line tool to just stream linux pipes:
如果您使用 Node.js,则可以使用 s3-streaming-upload 很容易实现这一点。
If you are using Node.js you can use a plugin like s3-streaming-upload to accomplish this quite easily.
请参阅有关 HTTP 多部分实体请求的更多信息。您可以将文件作为数据块发送到目标。
Refer more on HTTP multi-part enitity requests. You can send a file as chunks of data to the target.
参考:https://github.com/aws/aws-cli/pull/903
这是一个概要:
要将流从 stdin 上传到 s3,请使用:
aws s3 cp - s3://my-bucket/stream
要将 s3 对象下载为标准输出流,请使用:
aws s3 cp s3://my-bucket/stream -
例如,如果我有对象 s3://my-bucket/stream,我可以运行以下命令:
aws s3 cp s3://my-bucket/stream - | aws s3 cp s3://my-bucket/stream - | aws s3 cp - s3://my-bucket/new-stream
我的 cmd:
echo "ccc" | aws --endpoint-url=http://172.22.222.245:80 --no-verify-ssl s3 cp - s3://test-bucket/ccc
reference to :https://github.com/aws/aws-cli/pull/903
Here is a synopsis:
For uploading a stream from stdin to s3, use:
aws s3 cp - s3://my-bucket/stream
For downloading an s3 object as a stdout stream, use:
aws s3 cp s3://my-bucket/stream -
So for example, if I had the object s3://my-bucket/stream, I could run this command:
aws s3 cp s3://my-bucket/stream - | aws s3 cp - s3://my-bucket/new-stream
my cmd:
echo "ccc" | aws --endpoint-url=http://172.22.222.245:80 --no-verify-ssl s3 cp - s3://test-bucket/ccc