gsutil 需要 3 小时才能将 5000 个文件从一个存储桶传输到另一个存储桶
我正在尝试使用以下命令将 7000 个文件从一个存储桶传输到另一个存储桶。但大约需要 3 小时才能完成。如何将
$list 中的 ELEMENT 的时间优化为 5 分钟内;做 gsutil -m mv -r gs://${FILE_PATH}/$ELEMENT gs://${GCS_STORAGE_PATH}/ 完毕
I am trying to transfer 7000 file from one bucket to another bucket using below command. but it is taking approximately 3 hrs to complete. How to optimize this time within as 5 mins
for ELEMENT in $list; do
gsutil -m mv -r gs://${FILE_PATH}/$ELEMENT gs://${GCS_STORAGE_PATH}/
done
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您实际上并没有并行化
gsutil mv
;好吧,你只是为了一个文件(不是很多)。您依次将每个
${ELEMENT}
发送到gsutil -m mv
,这意味着该命令仅上传单个文件并在执行此操作时阻塞。每个文件被
mv
处理后,就会处理下一个文件。要正确使用
gsutil -m
,我认为您需要向命令传递一个表示文件集的通配符,例如并行的mv
。如果您无法为
gsutil
提供匹配文件集的模式(从而使其并行化命令),则可以使用另一种选择,但您需要尝试限制任务数量从gsutil
捕获stdout
|stderr
会更棘手,每个都到后台(&
)gsutil
命令。I think you're not actually parallelizing the
gsutil mv
; well, you are but only for one file (not many).You're sending each
${ELEMENT}
togsutil -m mv
in turn which means the command only uploads a single file and blocks while doing so.Once each file is
mv
'd, the next file is processed.To use
gsutil -m
correctly, I think you need to pass the command a wildcard that represents the set of files to e.g.mv
in parallel.If you're unable to provide
gsutil
with a pattern to match the set of files (and thus have it parallelize the command), another option, though you'll want to try to limit the number of tasks and it will be trickier to capture thestdout
|stderr
fromgsutil
, is to background (&
) eachgsutil
command.