什么时候 xargs 应该优于 while-read 循环?
xargs
广泛用于 shell 脚本编写;通常很容易在 bash 中使用 while read -r 重新构造这些用法; do ... done
或 while read -ar; do ...done 循环。
什么时候应该首选 xargs,什么时候应该首选 while-read 循环?
xargs
is widely used in shell scripting; it is usually easy to recast these uses in bash using while read -r; do ... done
or while read -ar; do ... done
loops.
When should xargs
be preferred, and when should while-read loops be preferred?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
while 循环的问题是它们倾向于一次处理一项,通常是在不必要的时候。这就是 xargs 的优势所在 - 它可以批量处理参数,以允许一个命令处理大量项目。
例如,一个 while 循环:
以及相应的
xargs
:这里您可以看到,这些行是用
while
逐一处理的,并与一起处理的。 xargs
。换句话说,前者相当于 echo 1 ;回声2;回声3;回声4; echo 5 而后者相当于echo 1 2 3 4 5
(五个进程而不是一个)。当处理数千或数万行时,这确实会产生影响,因为进程创建需要时间。当使用可以接受多个参数的命令时,它是最有利的,因为它减少了启动的单个进程的数量,使事情变得更快。
当我处理小文件或在每个项目上运行的命令很复杂时(我懒得编写单独的脚本来提供给
xargs
),我将使用while
变体。如果我对性能(大文件)感兴趣,我将使用 xargs,即使我必须编写单独的脚本。
The thing with
while
loops is that they tend to process one item at a time, often when it's unnecessary. This is wherexargs
has an advantage - it can batch up the arguments to allow one command to process lots of items.For example, a while loop:
and the corresponding
xargs
:Here you can see that the lines are processed one-by-one with the
while
and altogether with thexargs
. In other words, the former is equivalent toecho 1 ; echo 2 ; echo 3 ; echo 4 ; echo 5
while the latter is equivalent toecho 1 2 3 4 5
(five processes as opposed to one). This really makes a difference when processing thousands or tens of thousands of lines, since process creation takes time.It's mostly advantageous when using commands that can accept multiple arguments since it reduces the number of individual processes started, making things much faster.
When I'm processing small files or the commands to run on each item are complicated (where I'm too lazy to write a separate script to give to
xargs
), I will use thewhile
variant.Where I'm interested in performance (large files), I will use
xargs
, even if I have to write a separate script.xargs
的某些实现还可以理解-P MAX-PROCS
参数,该参数让xargs
并行运行多个作业。使用while read
循环来模拟这将是相当困难的。Some implementations of
xargs
also understand a-P MAX-PROCS
argument which letsxargs
run multiple jobs in parallel. This would be quite difficult to simulate with awhile read
loop.GNU Parallel http://www.gnu.org/software/parallel/ 具有以下优点来自 xargs(使用 -m)以及以换行符作为分隔符的 while-read 的优点和一些新功能(例如输出分组、在远程计算机上并行运行作业、和上下文替换)。
如果您安装了 GNU Parallel,我看不到您使用 xargs 的任何情况。我会使用 read-while 的唯一情况是,如果要执行的块太大,以至于无法读取放入一行(例如,如果它包含 if 语句或类似语句)并且你拒绝创建 bash 函数。
对于所有小脚本,我实际上发现使用 GNU Parallel 更具可读性。 paxdiablo 的示例:
使用 GNU Parallel 将 WAV 文件转换为 MP3:
观看 GNU Parallel 的介绍视频:http ://www.youtube.com/watch?v=OpaiGYxkSuQ
GNU Parallel http://www.gnu.org/software/parallel/ has the advantages from
xargs
(using -m) and the advantage ofwhile-read
with newline as separator and some new features (e.g. grouping of output, parallel running of jobs on remote computers, and context replace).If you have GNU Parallel installed I cannot see a single situation in which you would use
xargs
. And the only situation in which I would useread-while
would be if the block to execute is so big it becomes unreadable to put in a single line (e.g. if it contains if-statements or similar) and you refuse to make a bash function.For all the small scripts I actually find it more readable to use GNU Parallel. paxdiablo's example:
Converting of WAV files to MP3 using GNU Parallel:
Watch the intro video for GNU Parallel: http://www.youtube.com/watch?v=OpaiGYxkSuQ
“xargs”有选项“-n max-args”,我想这将允许一次调用多个参数的命令(对于“grep”,“rm”和更多这样的程序很有用)
尝试手册页中的示例:
您会看到它每行“回显”5 个
用户并且不要忘记“xargs” - 是程序(如子shell)。因此,无法以简单的方式将信息获取到您的 shell 脚本(您需要读取“xargs”的输出并以某种方式解释以填充您的 shell/env 变量)。
"xargs" have option "-n max-args", which I guess will allow to call command for several arguments at-once (useful for "grep", "rm" and many more such programs)
Try example from man-page:
And you'll see that it "echo"-ed 5 users per line
P.S. And don't forget that "xargs" - is program (like subshell). So no way to get information to your shell-script in an easy way (you'll need to read output of your "xargs" and interpret somehow to fill-up your shell/env-variables).
相反,在某些情况下,您有一个文件列表,每行 1 个,包含空格。例如来自
find
或pkgutil
或类似的。要使用 xargs,您必须首先使用 sed 将行括在引号中,但这看起来很笨拙。使用 while 循环,脚本可能看起来更容易读/写。引用空间污染的参数是微不足道的。下面的示例是人为的,但想象一下从
find
以外的其他地方获取文件列表...On the opposite, there are cases when you have a list of files, 1 per line, containing spaces. E.g. coming from a
find
or apkgutil
or similar. To work withxargs
you'll have to wrap the lines in quotes usingsed
first but this looks unwieldy.With a while loop the script might look easier to read/write. And quoting of space-contaminated args is trivial. The example below is artificial but imagine getting the list of files from something other than
find
...我不明白,人们一直在抱怨 while 必须如何在循环中而不是在循环外执行。我对linux方面知之甚少,但我知道使用MS-DOS的变量来建立参数列表是相当简单的,或者> >文件,cmd <如果超出行长度限制,则使用文件来构建参数列表。
或者人们说linux不如ms-dos? (天哪,我知道你可以构建链,因为许多 bash 脚本显然正在这样做,只是不是在循环中)。
此时,它变成了内核限制/偏好的问题。 xargs 并不神奇;管道确实比字符串构建有优势(嗯,ms-dos;您可以从“指针”构建字符串并避免任何复制(毕竟它是虚拟内存,除非您正在更改数据,否则您可以跳过字符串连接中的费用)。 ..但管道是一种更原生的支持))。实际上,我认为我无法赋予它并行处理的优势,因为您可以轻松创建多个任务循环来查看切片数据(如果您避免复制,这又是一个非常快的操作)。
最后,xargs更适合内联命令,速度优势可以忽略不计(编译/解释字符串构建之间的差异),因为它所做的一切,您都可以通过shell脚本完成。
I don't get it, people keep yammering on about how while MUST be execute in the loop instead of outside of the loop. I know very little on linux's side, but I know it is fairly simple to use MS-DOS's variables to build up a parameter list, or > file, cmd < file to build up a parameter list if you exceed the line length limitation.
Or are people saying that linux isn't as good as ms-dos? (Hell, I KNOW you can build chains because many bash scripts obviously are doing it, just not in loops).
At this point, it becomes a matter of kernel limitations / preference. xargs isn't magical; piping does have advantages over string building (well, ms-dos; you could build the string out of "pointers" and avoid any copying (it's virtual memory after all, unless you are changing the data you can skip the expense in string concat... but piping is a more native support)). Actually, I don't think I can give it the advantage of parallel processing because you can easily create several tasked loops to review sliced data (which again, if you avoid copying, is a very fast action).
In the end, xargs is more for inline commands, the speed advantage is negligable (the difference between compiled / interpreted string building) because everything it does, you can do via shell scripts.