将排序后的文件与 fifo 结合起来
我在目录中有一些排序的、压缩的文件。如何将其中一些合并到另一个排序的 gzip 压缩文件中?现在我正在使用显式先进先出。有没有办法在 bash 中做到这一点而不需要?我有点 bash 菜鸟,所以请原谅我缺乏风格。
#!/bin/bash
# Invocation ./merge [files ... ]
# Turns an arbitrary set of sorted, gzipped files into a single sorted, gzipped file,
# printed to stdout. Redirect this script's output!
for f in $@
do
mkfifo $f.raw
gzcat $f > $f.raw &
# sort -C $f.raw
done
sort -mu *.raw | gzip -c # prints to stdout.
rm -f *.raw
我希望将其转换为类似...
sort -mu <(gzcat $1) <(gzcat $2) <(gzcat $3) ... | gzip -9c # prints to stdout.
...但不知道如何。我是否需要一个循环来构建字符串参数?有什么神奇的捷径吗?也许映射 gzcat $@
?
注意:每个文件都超过 10GB(解压缩后为 100GB)。我有一个 2TB 驱动器,所以这并不是什么问题。此外,该程序必须在 O(n) 内运行,否则它将变得不可行。
I have some sorted, gzipped files in a directory. How do I combine some of them into another sorted, gzipped file? Right now I'm using explicit fifos. Is there a way to do it in bash without? I'm a bit of a bash noob, so please excuse my lack of style.
#!/bin/bash
# Invocation ./merge [files ... ]
# Turns an arbitrary set of sorted, gzipped files into a single sorted, gzipped file,
# printed to stdout. Redirect this script's output!
for f in $@
do
mkfifo $f.raw
gzcat $f > $f.raw &
# sort -C $f.raw
done
sort -mu *.raw | gzip -c # prints to stdout.
rm -f *.raw
I'm looking to convert this into something like...
sort -mu <(gzcat $1) <(gzcat $2) <(gzcat $3) ... | gzip -9c # prints to stdout.
...but don't know how. Do I need a loop building the parameters to string? Is there some sort of magic shortcut for this? Maybe map gzcat $@
?
NOTE: Each of the files is in excess of 10GB (and 100GB unzipped). I have a 2TB drive, so this isn't really a problem. Also, this program MUST run in O(n) or it becomes unfeasible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以将
eval
和“进程替换”与 Bash 结合起来。假设基本文件名不包含空格(考虑到您使用$@
而不是"$@"
可能是这种情况),那么类似:还可以在最后一行使用 bash -c "$cmd" 代替 eval $cmd 。如果文件名中有空格,你就得费点功夫了。如果名称不包含单引号,则此方法有效:
文件名中也包含单引号,您必须更加努力。
You can combine
eval
and 'process substitution' with Bash. Assuming the basic file names don't contain spaces (which, given that you use$@
instead of"$@"
is probably the case), then something like:You can also use
bash -c "$cmd"
instead ofeval $cmd
on the last line. If there are spaces in the file names, you have to work a bit harder. This works if the names don't contain single quotes:With single quotes in the file names too, you have to work a lot harder.
对我来说,你的问题有点不清楚,但如果我理解你的需求,试试这个:
如果你想在 1 个目录中执行某种类型的所有文件,那么你可以使用
file*.type
作为 Gunzip 的输入列表,否则,根据我的示例,您需要显式列出每个文件。-c
选项表示“将输出发送到 stdout”,这是管道读取的内容,发送到sort
,后者将其输出发送到 stdout、管道,然后发送到gzip,其标准输出被重定向到最终文件中。-9
是最高压缩,它为您提供最小的文件(对于 gzip),但需要更长的时间。您可以给出 -1 到 -9 之间的明确数字来调整压缩大小/时间,以根据您的需要进行压缩权衡。我希望这有帮助。
For me, your question is a little unclear, but if I understand your need, try this:
If you want to do all files of a certain type in 1 dir, then you can use
file*.type
as the input list to gunzip, otherwise, per my example, you'll need to list each file explicitly.The
-c
option indicates 'send output to stdout', which is the read by the pipe, sent tosort
, which sends its output to stdout, the pipe, and into gzip, with it's stdout being redirected into the final file. The-9
is the highest compress, which gives you the smallest file (for gzip), but takes longer. You can give an explicit number between -1 and -9 to tune the compression size/time to compress trade off for your needs.I hope this helps.
这是一种在文件名(或文件路径)中转义单引号的方法,它将在单引号包围的变量中进行
eval
'。Here's a way to escape single quotes within file names (or file paths) that will get
eval
'ed in variables surrounded by single quotes.