`tar` 通过 `xargs` 降低性能
请考虑这个片段:
tar -Oxvf archive.tgz | grep 一些东西
或这样:
tar tf archive.tgz > /tmp/x && tar -Oxvf archive.tgz -T /tmp/x | tar -Oxvf archive.tgz -T /tmp/x | grep some
与此:
tar tf archive.tgz | xargs -I{} tar -Oxvf archive.tgz {} | xargs -I{} tar -Oxvf archive.tgz {} | grep some
前两个片段非常快且相似,而第三个片段则慢了大约 40 倍(我猜这个索引与存档内容相关)。这是为什么?
Please consider this snippet:
tar -Oxvf archive.tgz | grep something
or this:
tar tf archive.tgz > /tmp/x && tar -Oxvf archive.tgz -T /tmp/x | grep something
versus this:
tar tf archive.tgz | xargs -I{} tar -Oxvf archive.tgz {} | grep something
First two snippets are very fast and similar, while third is ~40 times slower (this index is relative to archive contents I guess). Why is that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这里的关键是您在 xargs 中使用
-I{}
。手册页说:隐含的
-L 1
使xargs
为存档中的每个文件运行tar -Oxvf archive.tgz {}
一次,而不是运行 tar 一次提取 xargs 标准输入上列出的所有文件。差异的简化示例:
已修复:
但请注意,如果将文件名指定给
xargsxargs -I{}
得到的输出不同> 不符合 tar 文件顺序(即tar t
列出它们的顺序相同)。xargs -I{}
版本将按照您提供给 xargs 的顺序输出文件,而此版本将以 tar 文件顺序输出它们。The key here is your use of
-I{}
in xargs. The man page says:The implied
-L 1
makesxargs
run yourtar -Oxvf archive.tgz {}
once per file in the archive, rather than running tar once to extract all the files listed on xargs' stdin.Simplified example of the difference:
Fixed:
Note however that the output of this will not be the same as what you get using
xargs -I{}
if the file names given toxargs
are not in tar file order (i.e. the same order thattar t
lists them in). Thexargs -I{}
version will ouptut the files in the order you provided to xargs, whereas this version will output them in tar file order.我有点不确定你想通过你的例子达到什么目的。我不明白第一个示例中的第一个管道应该实现什么目的,因为没有使用通过管道传输到第二个 tar 的输出。
&&
似乎是连接两个命令的更好方法(仅当第一个命令成功时才执行第二个命令)。除此之外,如果您使用完整的文件列表进行提取(并且仅用于该任务),如您的示例中所示,则无需花费单独的 tar 运行来创建它,因为 tar 默认情况下会提取所有文件,除非另有说明。就速度而言 - 管道接收端的 tar 没有特殊的方法来区分它获得的输入是否来自另一个用于优化的 tar。但真正有区别的是,在两个 tar 命令的情况下,第一个命令将立即开始输出,因此第二个 tar 可以开始运行,而 xargs 将首先收集所有数据,然后开始输出并提供数据焦油安排追赶它。
如果您正在寻找一种快速方法来从 tar 存档中仅提取文件子集,并希望按文件名进行选择,我建议使用 star,它有一个内置的查找命令。
I'm somewhat uncertain what you want to achive with your examples. I don't understand what the first pipe in the first example is supposed to achieve, since the output that gets piped to the second tar isn't used. A
&&
would seem a better way to join both commands (execute the second only if the first was successful). Apart from that, if you're using the complete list of files for extraction (and only for that task), as in your examples, it wouldn't be necessary to spend a separate tar run on creating it, since tar by default will extract all files, unless told otherwise.As far as speed is concerned - the tar at the receiving end of the pipe has no special means to distinguish if the input it gets stems from another tar for optimization. What does make a difference though is, that in the case of two tar commands, the first will start its output immediately, and so the second tar can start running, while the xargs will gather all data first, and then start its output and feed the tar arranged to run after it.
If you're looking for a fast way to extract only a subset of files from a tar archive, and want to select by filename, I'd recommend using star, which has a builtin find command.