cat/Xargs/命令 VS for/bash/命令
《Linux 101 Hacks》一书的第 38 页建议:
cat url-list.txt | xargs wget –c
我通常会这样做:
for i in `cat url-list.txt`
do
wget -c $i
done
除了长度之外,是否还有其他东西让 xargs 技术优于 bash 中旧的良好的 for 循环技术?
添加了
C 源代码 似乎只有一个分支。 相比之下,有多少个分支有 bash-combo?请详细说明这个问题。
The page 38 of the book Linux 101 Hacks suggests:
cat url-list.txt | xargs wget –c
I usually do:
for i in `cat url-list.txt`
do
wget -c $i
done
Is there some thing, other than length, where the xargs-technique is superior to the old good for-loop-technique in bash?
Added
The C source code seems to have only one fork. In contrast, how many forks have the bash-combo? Please, elaborate on the issue.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
来自
xargs
的 UNIX 手册页的基本原理部分< /a>. (有趣的是,此部分没有出现在xargs
的 OS X BSD 版本中,也没有出现在 GNU 版本中。)在后续操作中,您询问另一个版本将有多少个分叉。 Jim 已经回答了这个问题:每次迭代一个。 有多少次迭代? 不可能给出确切的数字,但很容易回答一般性问题。 您的 url-list.txt 文件中有多少行?
还有其他一些其他考虑因素。
xargs
需要额外注意带有空格或其他禁止字符的文件名,并且-exec
有一个选项 (+
),它将处理分组为批次。 因此,并不是每个人都喜欢xargs
,而且也许它并不适合所有情况。请参阅以下链接:
From the Rationale section of a UNIX manpage for
xargs
. (Interestingly this section doesn't appear in the OS X BSD version ofxargs
, nor in the GNU version.)In your follow-up, you ask how many forks the other version will have. Jim already answered this: one per iteration. How many iterations are there? It's impossible to give an exact number, but easy to answer the general question. How many lines are there in your url-list.txt file?
There are other some other considerations.
xargs
requires extra care for filenames with spaces or other no-no characters, and-exec
has an option (+
), that groups processing into batches. So, not everyone prefersxargs
, and perhaps it's not best for all situations.See these links:
还要考虑:
但 wget 提供了一种更好的方法:
关于 xargs 与循环的考虑,当含义和实现相对“简单”和“清晰”时,我更喜欢 xargs,否则,我使用循环。
Also consider:
but wget provides an even better means for the same:
With respect to the xargs versus loop consideration, i prefer xargs when the meaning and implementation are relatively "simple" and "clear", otherwise, i use loops.
xargs 还允许您拥有一个巨大的列表,这对于“for”版本来说是不可能的,因为 shell 使用长度有限的命令行。
xargs will also allow you to have a huge list, which is not possible with the "for" version because the shell uses command lines limited in length.
xargs 旨在为其派生的每个进程处理多个输入。 在输入上使用
for
循环的 shell 脚本必须为每个输入创建一个新进程。 避免每个进程的开销可以使 xargs 解决方案显着提高性能。xargs
is designed to process multiple inputs for each process it forks. A shell script with afor
loop over its inputs must fork a new process for each input. Avoiding that per-process overhead can give anxargs
solution a significant performance enhancement.我更喜欢使用 xargs 内置的并行处理,而不是 GNU/Parallel。 添加 -P 以指示并行执行多少个 fork。 就像...
将在 3 个不同的核心上使用 3 个分支进行计算。 现代 GNU Xargs 支持这一点。 您必须亲自验证是否使用 BSD 还是 Solaris。
instead of GNU/Parallel i prefer using xargs' built in parallel processing. Add -P to indicate how many forks to perform in parallel. As in...
would use 3 forks on 3 different cores for computation. This is supported by modern GNU Xargs. You will have to verify for yourself if using BSD or Solaris.
我能想到的一个优点是,如果您有很多文件,它可能会稍微快一些,因为启动新进程没有太多开销。
不过,我并不是真正的 bash 专家,所以可能还有其他原因导致它更好(或更差)。
One advantage I can think of is that, if you have lots of files, it could be slightly faster since you don't have as much overhead from starting new processes.
I'm not really a bash expert though, so there could be other reasons it's better (or worse).
根据您的互联网连接,您可能需要使用 GNU Parallel http://www.gnu.org/software /parallel/ 并行运行它。
Depending on your internet connection you may want to use GNU Parallel http://www.gnu.org/software/parallel/ to run it in parallel.