将排序后的文件与 fifo 结合起来

发布于 2024-11-15 00:39:52 字数 773 浏览 6 评论 0原文

我在目录中有一些排序的、压缩的文件。如何将其中一些合并到另一个排序的 gzip 压缩文件中？现在我正在使用显式先进先出。有没有办法在 bash 中做到这一点而不需要？我有点 bash 菜鸟，所以请原谅我缺乏风格。

#!/bin/bash
# Invocation ./merge [files ... ]
# Turns an arbitrary set of sorted, gzipped files into a single sorted, gzipped file,
# printed to stdout. Redirect this script's output!
for f in $@
do
    mkfifo $f.raw
    gzcat $f > $f.raw &
    # sort -C $f.raw
done
sort -mu *.raw | gzip -c # prints to stdout.
rm -f *.raw

我希望将其转换为类似...

sort -mu <(gzcat $1) <(gzcat $2) <(gzcat $3) ... | gzip -9c # prints to stdout.

...但不知道如何。我是否需要一个循环来构建字符串参数？有什么神奇的捷径吗？也许映射 gzcat $@？

注意：每个文件都超过 10GB（解压缩后为 100GB）。我有一个 2TB 驱动器，所以这并不是什么问题。此外，该程序必须在 O(n) 内运行，否则它将变得不可行。

原文

I have some sorted, gzipped files in a directory. How do I combine some of them into another sorted, gzipped file? Right now I'm using explicit fifos. Is there a way to do it in bash without? I'm a bit of a bash noob, so please excuse my lack of style.

#!/bin/bash
# Invocation ./merge [files ... ]
# Turns an arbitrary set of sorted, gzipped files into a single sorted, gzipped file,
# printed to stdout. Redirect this script's output!
for f in $@
do
    mkfifo $f.raw
    gzcat $f > $f.raw &
    # sort -C $f.raw
done
sort -mu *.raw | gzip -c # prints to stdout.
rm -f *.raw

I'm looking to convert this into something like...

sort -mu <(gzcat $1) <(gzcat $2) <(gzcat $3) ... | gzip -9c # prints to stdout.

...but don't know how. Do I need a loop building the parameters to string? Is there some sort of magic shortcut for this? Maybe map gzcat $@?

NOTE: Each of the files is in excess of 10GB (and 100GB unzipped). I have a 2TB drive, so this isn't really a problem. Also, this program MUST run in O(n) or it becomes unfeasible.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

dawn曙光 2024-11-22 00:39:52

您可以将 eval 和“进程替换”与 Bash 结合起来。假设基本文件名不包含空格（考虑到您使用 $@ 而不是 "$@" 可能是这种情况），那么类似

cmd="sort -mu"
for file in "$@"
do cmd="$cmd <(gzip -cd $file)"
done
eval $cmd | gzip -c9 > outputfile.gz

：还可以在最后一行使用 bash -c "$cmd" 代替 eval $cmd 。如果文件名中有空格，你就得费点功夫了。如果名称不包含单引号，则此方法有效：

cmd="sort -mu"
for file in "$@"
do cmd="$cmd <(gzip -cd '$file')"
done
eval $cmd | gzip -c9 > outputfile.gz

文件名中也包含单引号，您必须更加努力。

You can combine eval and 'process substitution' with Bash. Assuming the basic file names don't contain spaces (which, given that you use $@ instead of "$@" is probably the case), then something like:

cmd="sort -mu"
for file in "$@"
do cmd="$cmd <(gzip -cd $file)"
done
eval $cmd | gzip -c9 > outputfile.gz

You can also use bash -c "$cmd" instead of eval $cmd on the last line. If there are spaces in the file names, you have to work a bit harder. This works if the names don't contain single quotes:

cmd="sort -mu"
for file in "$@"
do cmd="$cmd <(gzip -cd '$file')"
done
eval $cmd | gzip -c9 > outputfile.gz

With single quotes in the file names too, you have to work a lot harder.

回复收藏 0 原文

凯凯我们等你回来 2024-11-22 00:39:52

对我来说，你的问题有点不清楚，但如果我理解你的需求，试试这个：

gunzip -c file1 file2 .... | sort | gzip -9 > mergedFile.gz

如果你想在 1 个目录中执行某种类型的所有文件，那么你可以使用 file*.type作为 Gunzip 的输入列表，否则，根据我的示例，您需要显式列出每个文件。

-c 选项表示“将输出发送到 stdout”，这是管道读取的内容，发送到 sort，后者将其输出发送到 stdout、管道，然后发送到gzip，其标准输出被重定向到最终文件中。 -9 是最高压缩，它为您提供最小的文件（对于 gzip），但需要更长的时间。您可以给出 -1 到 -9 之间的明确数字来调整压缩大小/时间，以根据您的需要进行压缩权衡。

我希望这有帮助。

For me, your question is a little unclear, but if I understand your need, try this:

gunzip -c file1 file2 .... | sort | gzip -9 > mergedFile.gz

If you want to do all files of a certain type in 1 dir, then you can use file*.type as the input list to gunzip, otherwise, per my example, you'll need to list each file explicitly.

The -c option indicates 'send output to stdout', which is the read by the pipe, sent to sort, which sends its output to stdout, the pipe, and into gzip, with it's stdout being redirected into the final file. The -9 is the highest compress, which gives you the smallest file (for gzip), but takes longer. You can give an explicit number between -1 and -9 to tune the compression size/time to compress trade off for your needs.

I hope this helps.

回复收藏 0 原文

春庭雪 2024-11-22 00:39:52

文件名中也包含单引号，您必须更加努力。

这是一种在文件名（或文件路径）中转义单引号的方法，它将在单引号包围的变量中进行eval'。

(
esc="'\''"
file="/Applications/iWork '09/Pages.app"
file="${file//\'/${esc}}"
#echo "'${file}'"; ls -bdl "'${file}'"
evalstr="echo '${file}'; ls -bdl '${file}'"
#set -xv
eval "${evalstr}"
)

With single quotes in the file names too, you have to work a lot harder.

Here's a way to escape single quotes within file names (or file paths) that will get eval'ed in variables surrounded by single quotes.

(
esc="'\''"
file="/Applications/iWork '09/Pages.app"
file="${file//\'/${esc}}"
#echo "'${file}'"; ls -bdl "'${file}'"
evalstr="echo '${file}'; ls -bdl '${file}'"
#set -xv
eval "${evalstr}"
)

回复收藏 0 原文

~没有更多了~