什么时候 xargs 应该优于 while-read 循环?

发布于 2024-08-27 19:27:10 字数 199 浏览 3 评论 0原文

xargs 广泛用于 shell 脚本编写;通常很容易在 bash 中使用 while read -r 重新构造这些用法; do ... donewhile read -ar; do ...done 循环。

什么时候应该首选 xargs,什么时候应该首选 while-read 循环?

xargs is widely used in shell scripting; it is usually easy to recast these uses in bash using while read -r; do ... done or while read -ar; do ... done loops.

When should xargs be preferred, and when should while-read loops be preferred?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

小嗲 2024-09-03 19:27:10

while 循环的问题是它们倾向于一次处理一项,通常是在不必要的时候。这就是 xargs 的优势所在 - 它可以批量处理参数,以允许一个命令处理大量项目。

例如,一个 while 循环:

pax> echo '1
2
3 
4
5' | while read -r; do echo $REPLY; done
1
2
3
4
5

以及相应的 xargs

pax> echo '1
2
3 
4
5' | xargs echo
1 2 3 4 5

这里您可以看到,这些行是用 while 逐一处理的,并与 一起处理的。 xargs。换句话说,前者相当于 echo 1 ;回声2;回声3;回声4; echo 5 而后者相当于 echo 1 2 3 4 5(五个进程而不是一个)。当处理数千或数万行时,这确实会产生影响,因为进程创建需要时间。

当使用可以接受多个参数的命令时,它是最有利的,因为它减少了启动的单个进程的数量,使事情变得更快。

当我处理小文件或在每个项目上运行的命令很复杂时(我懒得编写单独的脚本来提供给 xargs),我将使用 while 变体。

如果我对性能(大文件)感兴趣,我将使用 xargs,即使我必须编写单独的脚本。

The thing with while loops is that they tend to process one item at a time, often when it's unnecessary. This is where xargs has an advantage - it can batch up the arguments to allow one command to process lots of items.

For example, a while loop:

pax> echo '1
2
3 
4
5' | while read -r; do echo $REPLY; done
1
2
3
4
5

and the corresponding xargs:

pax> echo '1
2
3 
4
5' | xargs echo
1 2 3 4 5

Here you can see that the lines are processed one-by-one with the while and altogether with the xargs. In other words, the former is equivalent to echo 1 ; echo 2 ; echo 3 ; echo 4 ; echo 5 while the latter is equivalent to echo 1 2 3 4 5 (five processes as opposed to one). This really makes a difference when processing thousands or tens of thousands of lines, since process creation takes time.

It's mostly advantageous when using commands that can accept multiple arguments since it reduces the number of individual processes started, making things much faster.

When I'm processing small files or the commands to run on each item are complicated (where I'm too lazy to write a separate script to give to xargs), I will use the while variant.

Where I'm interested in performance (large files), I will use xargs, even if I have to write a separate script.

百善笑为先 2024-09-03 19:27:10

xargs 的某些实现还可以理解 -P MAX-PROCS 参数,该参数让 xargs 并行运行多个作业。使用 while read 循环来模拟这将是相当困难的。

Some implementations of xargs also understand a -P MAX-PROCS argument which lets xargs run multiple jobs in parallel. This would be quite difficult to simulate with a while read loop.

初懵 2024-09-03 19:27:10

GNU Parallel http://www.gnu.org/software/parallel/ 具有以下优点来自 xargs(使用 -m)以及以换行符作为分隔符的 while-read 的优点和一些新功能(例如输出分组、在远程计算机上并行运行作业、和上下文替换)。

如果您安装了 GNU Parallel,我看不到您使用 xargs 的任何情况。我会使用 read-while 的唯一情况是,如果要执行的块太大,以至于无法读取放入一行(例如,如果它包含 if 语句或类似语句)并且你拒绝创建 bash 函数。

对于所有小脚本,我实际上发现使用 GNU Parallel 更具可读性。 paxdiablo 的示例:

echo '1
2
3 
4
5' | parallel -m echo

使用 GNU Parallel 将 WAV 文件转换为 MP3:

find sounddir -type f -name '*.wav' | parallel -j+0 lame {} -o {.}.mp3

观看 GNU Parallel 的介绍视频:http ://www.youtube.com/watch?v=OpaiGYxkSuQ

GNU Parallel http://www.gnu.org/software/parallel/ has the advantages from xargs (using -m) and the advantage of while-read with newline as separator and some new features (e.g. grouping of output, parallel running of jobs on remote computers, and context replace).

If you have GNU Parallel installed I cannot see a single situation in which you would use xargs. And the only situation in which I would use read-while would be if the block to execute is so big it becomes unreadable to put in a single line (e.g. if it contains if-statements or similar) and you refuse to make a bash function.

For all the small scripts I actually find it more readable to use GNU Parallel. paxdiablo's example:

echo '1
2
3 
4
5' | parallel -m echo

Converting of WAV files to MP3 using GNU Parallel:

find sounddir -type f -name '*.wav' | parallel -j+0 lame {} -o {.}.mp3

Watch the intro video for GNU Parallel: http://www.youtube.com/watch?v=OpaiGYxkSuQ

涙—继续流 2024-09-03 19:27:10

“xargs”有选项“-n max-args”,我想这将允许一次调用多个参数的命令(对于“grep”,“rm”和更多这样的程序很有用)
尝试手册页中的示例:

cut -d: -f1 < /etc/passwd | sort | xargs -n 5 echo

您会看到它每行“回显”5 个

用户并且不要忘记“xargs” - 是程序(如子shell)。因此,无法以简单的方式将信息获取到您的 shell 脚本(您需要读取“xargs”的输出并以某种方式解释以填充您的 shell/env 变量)。

"xargs" have option "-n max-args", which I guess will allow to call command for several arguments at-once (useful for "grep", "rm" and many more such programs)
Try example from man-page:

cut -d: -f1 < /etc/passwd | sort | xargs -n 5 echo

And you'll see that it "echo"-ed 5 users per line

P.S. And don't forget that "xargs" - is program (like subshell). So no way to get information to your shell-script in an easy way (you'll need to read output of your "xargs" and interpret somehow to fill-up your shell/env-variables).

满身野味 2024-09-03 19:27:10

相反,在某些情况下,您有一个文件列表,每行 1 个,包含空格。例如来自 findpkgutil 或类似的。要使用 xargs,您必须首先使用 sed 将行括在引号中,但这看起来很笨拙。

使用 while 循环,脚本可能看起来更容易读/写。引用空间污染的参数是微不足道的。下面的示例是人为的,但想象一下从 find 以外的其他地方获取文件列表...

function process {
  while read line; do
    test -d "$line" && echo "$line"
  done
}

find . -name "*foo*" | process

On the opposite, there are cases when you have a list of files, 1 per line, containing spaces. E.g. coming from a find or a pkgutil or similar. To work with xargs you'll have to wrap the lines in quotes using sed first but this looks unwieldy.

With a while loop the script might look easier to read/write. And quoting of space-contaminated args is trivial. The example below is artificial but imagine getting the list of files from something other than find...

function process {
  while read line; do
    test -d "$line" && echo "$line"
  done
}

find . -name "*foo*" | process
若有似无的小暗淡 2024-09-03 19:27:10

我不明白,人们一直在抱怨 while 必须如何在循环中而不是在循环外执行。我对linux方面知之甚少,但我知道使用MS-DOS的变量来建立参数列表是相当简单的,或者> >文件,cmd <如果超出行长度限制,则使用文件来构建参数列表。

或者人们说linux不如ms-dos? (天哪,我知道你可以构建链,因为许多 bash 脚本显然正在这样做,只是不是在循环中)。

此时,它变成了内核限制/偏好的问题。 xargs 并不神奇;管道确实比字符串构建有优势(嗯,ms-dos;您可以从“指针”构建字符串并避免任何复制(毕竟它是虚拟内存,除非您正在更改数据,否则您可以跳过字符串连接中的费用)。 ..但管道是一种更原生的支持))。实际上,我认为我无法赋予它并行处理的优势,因为您可以轻松创建多个任务循环来查看切片数据(如果您避免复制,这又是一个非常快的操作)。

最后,xargs更适合内联命令,速度优势可以忽略不计(编译/解释字符串构建之间的差异),因为它所做的一切,您都可以通过shell脚本完成。

I don't get it, people keep yammering on about how while MUST be execute in the loop instead of outside of the loop. I know very little on linux's side, but I know it is fairly simple to use MS-DOS's variables to build up a parameter list, or > file, cmd < file to build up a parameter list if you exceed the line length limitation.

Or are people saying that linux isn't as good as ms-dos? (Hell, I KNOW you can build chains because many bash scripts obviously are doing it, just not in loops).

At this point, it becomes a matter of kernel limitations / preference. xargs isn't magical; piping does have advantages over string building (well, ms-dos; you could build the string out of "pointers" and avoid any copying (it's virtual memory after all, unless you are changing the data you can skip the expense in string concat... but piping is a more native support)). Actually, I don't think I can give it the advantage of parallel processing because you can easily create several tasked loops to review sliced data (which again, if you avoid copying, is a very fast action).

In the end, xargs is more for inline commands, the speed advantage is negligable (the difference between compiled / interpreted string building) because everything it does, you can do via shell scripts.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文