锁定并行多次调用的 shell 脚本的输出文件
我有近一百万个文件,我想在其中运行 shell 脚本并将结果附加到单个文件中。
例如,假设我只想对文件运行 wc
。 为了让它运行得更快,我可以将它与 xargs 并行化。但我不希望脚本在编写输出时相互跨越。最好写入几个单独的文件,而不是写入一个文件,然后再 cat
它们。但我仍然希望此类临时输出文件的数量明显小于输入文件的数量。有没有办法获得我想要的锁定类型,或者默认情况下总是确保这种情况?
是否有任何实用程序可以并行递归地cat
两个文件?
我可以编写一个脚本来做到这一点,但必须处理临时数据并进行清理。所以想知道是否有一个实用程序可以做到这一点。
I have close to a million files over which I want to run a shell script and append the result to a single file.
For example suppose I just want to run wc
on the files.
So that it runs fast I can parallelize it with xargs
. But I do not want the scripts to step over each other when writing the output. It is probably better to write to a few separate files rather than one and then cat
them later. But I still want the number of such temporary output files to be significantly smaller than the number of input files. Is there a way to get the kind of locking I want, or is it the case that is always ensured by default?
Is there any utility that will recursively cat
two files in parallel?
I can write a script to do that, but have to deal with the temporaries and clean up. So was wondering if there is an utility which does that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
GNU 并行 声称它:
如果是这种情况,那么我认为将输出简单地通过管道传输到文件并让并行处理中间数据应该是安全的
使用
-k
选项来维护输出的顺序。更新:(非 Perl 解决方案)
另一种选择是 prll,它是通过带有一些 C 扩展的 shell 函数实现的。与 GNU 并行相比,它的功能不太丰富,但应该可以满足基本用例的要求。
功能列表声称:
因此,只要输出顺序不重要,它就应该满足您的需求
但是,请注意以下关于 此页面:
免责声明:我没有尝试过这两种工具,只是引用了它们各自的文档。
GNU parallel claims that it:
If that's the case, then I presume it should be safe to simple pipe the output to your file and let
parallel
handle the intermediate data.Use the
-k
option to maintain the order of the output.Update: (non-Perl solution)
Another alternative would be prll, which is implemented with shell functions with some C extensions. It is less feature-rich compared to
GNU parallel
but should the the job for basic use cases.The feature listing claims:
so it should meet your needs as long as order of output is not important
However, note on the following statement on this page:
Disclaimer: I've tried neither of the tools and am merely quoting from their respective docs.