“找到”和“ls”与 GNU 并行
我正在尝试使用 GNU 并行将大量文件发布到网络服务器。在我的目录中,我有一些文件:
file1.xml
file2.xml
我有一个如下所示的 shell 脚本:
#! /usr/bin/env bash
CMD="curl -X POST -d@$1 http://server/path"
eval $CMD
脚本中还有一些其他内容,但这是最简单的示例。我尝试执行以下命令:
ls | parallel -j2 script.sh {}
这是 GNU 并行页面显示的操作目录中文件的“正常”方式。这似乎将文件名传递到我的脚本中,但curl抱怨它无法加载传入的数据文件。但是,如果我这样做:
find . -name '*.xml' | parallel -j2 script.sh {}
它工作正常。 ls
和 find
将参数传递给我的脚本的方式有区别吗?或者我需要在该脚本中做一些额外的事情吗?
I'm trying to use GNU parallel
to post a lot of files to a web server. In my directory, I have some files:
file1.xml
file2.xml
and I have a shell script that looks like this:
#! /usr/bin/env bash
CMD="curl -X POST -d@$1 http://server/path"
eval $CMD
There's some other stuff in the script, but this was the simplest example. I tried to execute the following command:
ls | parallel -j2 script.sh {}
Which is what the GNU parallel
pages show as the "normal" way to operate on files in a directory. This seems to pass the name of the file into my script, but curl complains that it can't load the data file passed in. However, if I do:
find . -name '*.xml' | parallel -j2 script.sh {}
it works fine. Is there a difference between how ls
and find
are passing arguments to my script? Or do I need to do something additional in that script?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
GNU
parallel
是xargs
的变体。它们都有非常相似的界面,如果您正在寻求有关并行
的帮助,您可能会更幸运地查找有关xargs
的信息。话虽如此,它们的操作方式都相当简单。在默认行为下,这两个程序都从 STDIN 读取输入,然后根据空格将输入分解为标记。然后将每个令牌作为参数传递给提供的程序。 xargs 的默认设置是将尽可能多的令牌传递给程序,然后在达到限制时启动一个新进程。我不确定并行的默认值是如何工作的。
下面是一个示例:
默认行为存在一些问题,因此通常会看到多种变化。
第一个问题是,由于空格用于标记,因此任何包含空格的文件都会导致并行和 xargs 中断。一种解决方案是围绕 NULL 字符进行标记。
find
甚至提供了一个选项来简化此操作:-print0
选项告诉find
使用 NULL 字符而不是空格分隔文件。-0
选项告诉xargs
使用 NULL 字符来标记每个参数。请注意,
parallel
比xargs
稍好一些,因为它的默认行为是仅围绕换行符进行标记,因此无需更改默认行为。另一个常见问题是您可能希望控制如何将参数传递给 xargs 或parallel。如果您需要将传递给程序的参数指定为特定位置,则可以使用
{}
指定参数的放置位置。这会将当前目录和子目录中的所有文件移动到 new_dir 目录中。它实际上分为以下几部分:
因此,考虑到 xargs 和parallel 的工作方式,您应该能够通过命令看到问题。 <代码>查找 . -name '*.xml' 将生成要传递给
script.sh
程序的 xml 文件列表。然而,ls | parallel -j2 script.sh {} 将生成当前目录中所有文件的列表,并将其传递给 script.sh 程序。
ls 版本的更正确变体如下:
但是,此版本与 find 版本之间的重要区别在于 find 将搜索所有子目录中的文件,而 ls 将仅搜索当前目录。上述
ls
命令的等效find
版本如下:这只会搜索当前目录。
GNU
parallel
is a variant ofxargs
. They both have very similar interfaces, and if you're looking for help onparallel
, you may have more luck looking up information aboutxargs
.That being said, the way they both operate is fairly simple. With their default behavior, both programs read input from STDIN, then break the input up into tokens based on whitespace. Each of these tokens is then passed to a provided program as an argument. The default for xargs is to pass as many tokens as possible to the program, and then start a new process when the limit is hit. I'm not sure how the default for parallel works.
Here is an example:
There are some problems with the default behavior, so it is common to see several variations.
The first issue is that because whitespace is used to tokenize, any files with white space in them will cause parallel and xargs to break. One solution is to tokenize around the NULL character instead.
find
even provides an option to make this easy to do:The
-print0
option tellsfind
to seperate files with the NULL character instead of whitespace.The
-0
option tellsxargs
to use the NULL character to tokenize each argument.Note that
parallel
is a little better thanxargs
in that its default behavior is the tokenize around only newlines, so there is less of a need to change the default behavior.Another common issue is that you may want to control how the arguments are passed to
xargs
orparallel
. If you need to have a specific placement of the arguments passed to the program, you can use{}
to specify where the argument is to be placed.This will move all files in the current directory and subdirectories into the new_dir directory. It actually breaks down into the following:
So taking into consideration how
xargs
andparallel
work, you should hopefully be able to see the issue with your command.find . -name '*.xml'
will generate a list of xml files to be passed to thescript.sh
program.However,
ls | parallel -j2 script.sh {}
will generate a list of ALL files in the current directory to be passed to the script.sh program.A more correct variant on the
ls
version would be as follows:However, and important difference between this and the find version is that find will search through all subdirectories for files, while ls will only search the current directory. The equivalent
find
version of the abovels
command would be as follows:This will only search the current directory.
由于它与
find
一起使用,您可能想查看 GNU Parallel 正在运行什么命令(使用 -v 或 --dryrun),然后尝试手动运行失败的命令。Since it works with
find
you probably want to see what command GNU Parallel is running (using -v or --dryrun) and then try to run the failing commands manually.我没有使用过parallel,但是ls和ls之间有区别。 <代码>查找 . -名称“*.xml”。
ls
将列出find 所在的所有文件和目录。 -name '*.xml'
将仅列出以 .xml 结尾的文件(和目录)。正如 Paul Rubel 所建议的,只需在脚本中打印 $1 的值即可进行检查。此外,您可能需要考虑仅使用
-type f
选项在find
中过滤文件输入。希望这有帮助!
I have not used
parallel
but there is a different betweenls
&find . -name '*.xml'
.ls
will list all the files and directories where asfind . -name '*.xml'
will list only the files (and directories) which end with a .xml.As suggested by Paul Rubel, just print the value of $1 in your script to check this. Additionally you may want to consider filtering the input to files only in
find
with the-type f
option.Hope this helps!
整洁的。
我以前从未使用过并行。看起来,虽然他们有两个。
一种是 Gnu Parallel,我的系统上安装的有 Tollef Fog Heen
在手册页中列为作者。
正如保罗提到的,你应该使用
set -x
另外,您上面提到的范例似乎不适用于我的并行,相反,我有
执行以下操作:
find 确实提供了不同的输入,它在名称前面添加了相对路径。
也许这就是你的剧本混乱的原因?
Neat.
I had never used parallel before. It appears, though that there are two of them.
One is the Gnu Parrallel, and the one that was installed on my system has Tollef Fog Heen
listed as the author in the man pages.
As Paul mentioned, you should use
set -x
Also, the paradigm that you mentioned above doesn't seem to work on my parallel, rather, I have
to do the following:
find does provide a different input, It prepends the relative path to the name.
Maybe that is what is messing up your script?