s3cmd 失败次数过多
我曾经是一个快乐的 s3cmd 用户。然而,最近当我尝试将大型 zip 文件 (~7Gig) 传输到 Amazon S3 时,收到此错误:
$> s3cmd put thefile.tgz s3://thebucket/thefile.tgz
....
20480 of 7563176329 0% in 1s 14.97 kB/s failed
WARNING: Upload failed: /thefile.tgz ([Errno 32] Broken pipe)
WARNING: Retrying on lower speed (throttle=1.25)
WARNING: Waiting 15 sec...
thefile.tgz -> s3://thebucket/thefile.tgz [1 of 1]
8192 of 7563176329 0% in 1s 5.57 kB/s failed
ERROR: Upload of 'thefile.tgz' failed too many times. Skipping that file.
我正在使用最新的 Ubuntu 上的 s3cmd。
为什么会这样呢?我该如何解决这个问题?如果无法解决,我可以使用什么替代工具?
I used to be a happy s3cmd user. However recently when I try to transfer a large zip file (~7Gig) to Amazon S3, I am getting this error:
gt; s3cmd put thefile.tgz s3://thebucket/thefile.tgz
....
20480 of 7563176329 0% in 1s 14.97 kB/s failed
WARNING: Upload failed: /thefile.tgz ([Errno 32] Broken pipe)
WARNING: Retrying on lower speed (throttle=1.25)
WARNING: Waiting 15 sec...
thefile.tgz -> s3://thebucket/thefile.tgz [1 of 1]
8192 of 7563176329 0% in 1s 5.57 kB/s failed
ERROR: Upload of 'thefile.tgz' failed too many times. Skipping that file.
I am using the latest s3cmd on Ubuntu.
Why is it so? and how can I solve it? If it is unresolvable, what alternative tool can I use?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
现在到了 2014 年,aws cli 能够代替 s3cmd 上传大文件。
http://docs.aws.amazon .com/cli/latest/userguide/cli-chap-getting-set-up.html 有安装/配置说明,或者通常:
后面跟着
会给你满意的结果。
And now in 2014, the aws cli has the ability to upload big files in lieu of s3cmd.
http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html has install / configure instructions, or often:
followed by
will get you satisfactory results.
我自己刚刚遇到这个问题。我有一个 24GB 的 .tar.gz 文件要放入 S3 中。
上传较小的片段会有帮助。
文件大小也有大约 5GB 的限制,因此我将文件分成几部分,稍后下载这些部分时可以重新组合。
该行的最后一部分是“前缀”。 Split 将在其后面附加 'aa'、'ab'、'ac' 等。 -b100m 表示 100MB 块。 24GB 文件最终将包含大约 240 100mb 部分,称为“input-24GB-file.tar.gz-aa”到“input-24GB-file.tar.gz-jf”。
稍后要将它们组合起来,请将它们全部下载到一个目录中,并:
获取原始文件和分割文件的 md5sum 并将其存储在 S3 存储桶中,或者更好,如果它不是那么大,使用像 parchive 能够检查,甚至修复一些下载问题也可能很有价值。
I've just come across this problem myself. I've got a 24GB .tar.gz file to put into S3.
Uploading smaller pieces will help.
There is also ~5GB file size limit, and so I'm splitting the file into pieces, that can be re-assembled when the pieces are downloaded later.
The last part of that line is a 'prefix'. Split will append 'aa', 'ab', 'ac', etc to it. The -b100m means 100MB chunks. A 24GB file will end up with about 240 100mb parts, called 'input-24GB-file.tar.gz-aa' to 'input-24GB-file.tar.gz-jf'.
To combine them later, download them all into a directory and:
Taking md5sums of the original and split files and storing that in the S3 bucket, or better, if its not so big, using a system like parchive to be able to check, even fix some download problems could also be valuable.
我尝试了所有其他答案,但没有一个有效。看起来 s3cmd 相当敏感。
就我而言,s3 存储桶位于欧盟。小文件可以上传,但当它达到约 60k 时,它总是失败。
当我更改 ~/.s3cfg 时它起作用了。
以下是我所做的更改:
host_base = s3-eu-west-1.amazonaws.com
host_bucket = %(bucket)s.s3-eu-west-1.amazonaws.com
I tried all of the other answers but none worked. It looks like s3cmd is fairly sensitive.
In my case the s3 bucket was in the EU. Small files would upload but when it got to ~60k it always failed.
When I changed ~/.s3cfg it worked.
Here are the changes I made:
host_base = s3-eu-west-1.amazonaws.com
host_bucket = %(bucket)s.s3-eu-west-1.amazonaws.com
我在 ubuntu s3cmd 中遇到了同样的问题。
解决方案是使用来自 s3tools.org 的说明更新 s3cmd:
I had the same problem with ubuntu s3cmd.
The solution was to update s3cmd with the instructions from s3tools.org:
当亚马逊返回错误时,就会发生此错误:他们似乎会断开套接字连接,以阻止您上传千兆字节的请求以返回“不,失败”作为响应。这就是为什么有些人因为时钟偏差而得到它,有些人因为策略错误而得到它,而另一些人则遇到需要使用分段上传 API 的大小限制。这并不是说每个人都是错的,或者甚至都在考虑不同的问题:这些都是 s3cmd 中相同基本行为的不同症状。
由于大多数错误条件都是确定性的,s3cmd 丢弃错误消息并重试速度变慢的行为是一种疯狂的不幸:(。然后要获取实际的错误消息,您可以进入 /usr/share/s3cmd/S3/ S3.py(记住删除相应的 .pyc,以便使用更改)并在 send_file 函数的
except Exception, e:
块中添加一个print e
。 ,我试图将上传文件的 Content-Type 设置为“application/x-debian-package”。显然,s3cmd 的 S3.object_put 1) 不支持通过 --add-header 传递的 Content-Type,但 2 )无法覆盖通过 --add-header 添加的 Content-Type,因为它将标头存储在具有区分大小写键的字典中。结果是它使用“内容类型”的值进行签名计算,然后最终(至少有许多请求;这可能基于某处的某种哈希排序)将“内容类型”发送到亚马逊,导致签名错误。
在我今天的具体情况下,似乎 -M 会导致 s3cmd 猜测正确的 Content-Type,但它似乎仅基于文件名来做到这一点......我希望它会根据内容使用 mimemagic 数据库文件的。但老实说:s3cmd 在上传文件失败时甚至无法返回失败的 shell 退出状态,因此结合所有这些其他问题,最好编写自己的一次性工具来完成此操作你需要的东西......几乎可以肯定,当你遇到这个工具的某些极端情况时,它最终会节省你的时间:(。
This error occurs when Amazon returns an error: they seem to then disconnect the socket to keep you from uploading gigabytes of request to get back "no, that failed" in response. This is why for some people are getting it due to clock skew, some people are getting it due to policy errors, and others are running into size limitations requiring the use of the multi-part upload API. It isn't that everyone is wrong, or are even looking at different problems: these are all different symptoms of the same underlying behavior in s3cmd.
As most error conditions are going to be deterministic, s3cmd's behavior of throwing away the error message and retrying slower is kind of crazy unfortunate :(. Itthen To get the actual error message, you can go into /usr/share/s3cmd/S3/S3.py (remembering to delete the corresponding .pyc so the changes are used) and add a
print e
in the send_file function'sexcept Exception, e:
block.In my case, I was trying to set the Content-Type of the uploaded file to "application/x-debian-package". Apparently, s3cmd's S3.object_put 1) does not honor a Content-Type passed via --add-header and yet 2) fails to overwrite the Content-Type added via --add-header as it stores headers in a dictionary with case-sensitive keys. The result is that it does a signature calculation using its value of "content-type" and then ends up (at least with many requests; this might be based on some kind of hash ordering somewhere) sending "Content-Type" to Amazon, leading to the signature error.
In my specific case today, it seems like -M would cause s3cmd to guess the right Content-Type, but it seems to do that based on filename alone... I would have hoped that it would use the mimemagic database based on the contents of the file. Honestly, though: s3cmd doesn't even manage to return a failed shell exit status when it fails to upload the file, so combined with all of these other issues it is probably better to just write your own one-off tool to do the one thing you need... it is almost certain that in the end it will save you time when you get bitten by some corner-case of this tool :(.
s3cmd 1.0.0 尚不支持多部分。我尝试了 1.1.0-beta,效果很好。您可以在此处了解新功能:http://s3tools.org/s3cmd-110b2-released
s3cmd 1.0.0 does not support multi-part yet. I tried 1.1.0-beta and it works just fine. You can read about the new features here: http://s3tools.org/s3cmd-110b2-released
就我而言,失败的原因是服务器的时间早于 S3 时间。由于我在我的服务器(位于美国东部)中使用 GMT+4,并且我使用亚马逊的美国东部存储设施。
将我的服务器调整为美国东部时间后,问题就消失了。
In my case the reason of the failure was the server's time being ahead of the S3 time. Since I used GMT+4 in my server (located in US East) and I was using Amazon's US East storage facility.
After adjusting my server to the US East time, the problem was gone.
我遇到了同样的问题,结果是
~/.s3cfg
中的bucket_location
值不正确。这篇博文引导我找到了答案。
检查我的
~/.s3cfg
后是看到它有:而不是:
更正此值以使用 正确的值 name(s) 解决了这个问题。
I experienced the same issue, it turned out to be a bad
bucket_location
value in~/.s3cfg
.This blog post lead me to the answer.
After inspecting my
~/.s3cfg
is saw that it had:Rather than:
Correcting this value to use the proper name(s) solved the issue.
对我来说,以下方法有效:
在.s3cfg中,我更改了host_bucket
For me, the following worked:
In .s3cfg, I changed the host_bucket
s3cmd 版本 1.1.0-beta3 或更高版本将自动使用分段上传 允许发送任意大的文件(源)。您也可以控制它使用的块大小。例如,
这将以 1 GB 块的形式进行上传。
s3cmd version 1.1.0-beta3 or better will automatically use multipart uploads to allow sending up arbitrarily large files (source). You can control the chunk size it uses, too. e.g.
This will do the upload in 1 GB chunks.
由于安全组策略设置错误,我遇到了相同的断管错误。我责怪 S3 文档。
我写了关于如何正确设置策略 在我的博客中,它是:
I encountered the same broken pipe error as the security group policy was set wrongly.. I blame S3 documentation.
I wrote about how to set the policy correctly in my blog, which is:
就我而言,我已经解决了这个问题,只是添加了正确的权限。
On my case, I've fixed this just adding right permissions.
我遇到了类似的错误,最终证明是由机器上的时间漂移引起的。正确设置时间为我解决了这个问题。
I encountered a similar error which eventually turned out to be caused by a time drift on the machine. Correctly setting the time fixed the issue for me.
搜索
.s3cfg
文件,通常在您的主文件夹中。如果你拥有它,你就得到了恶棍。更改以下两个参数应该会对您有所帮助。
Search for
.s3cfg
file, generally in your Home Folder.If you have it, you got the villain. Changing the following two parameters should help you.
我通过简单地不使用 s3cmd 来解决这个问题。相反,我在 GitHub 上的 python 项目 S3-Multipart 上取得了巨大成功。它进行上传和下载,并根据需要使用尽可能多的线程。
I addressed this by simply not using s3cmd. Instead, I've had great success with the python project, S3-Multipart on GitHub. It does uploading and downloading, along with using as many threads as desired.