在 bash 中运行并行子进程的 bash 脚本的奇怪行为
以下脚本用于在 bash 中运行并行子进程,与 在 bash 中并行运行有限数量的子进程?
#!/bin/bash
set -o monitor # means: run background processes in a separate processes...
N=1000
todo_array=($(seq 0 $((N-1))))
max_jobs=5
trap add_next_job CHLD
index=0
function add_next_job {
if [[ $index -lt ${#todo_array[@]} ]]
then
do_job $index &
index=$(($index+1))
fi
}
function do_job {
echo $1 start
time=$(echo "scale=0;x=$RANDOM % 10;scale=5;x/20+0.05" |bc);sleep $time;echo $time
echo $1 done
}
while [[ $index -lt $max_jobs ]] && [[ $index -lt ${#todo_array[@]} ]]
do
add_next_job
done
wait
这项工作是在 0.05:0.05:5.00 中选择一个随机数,然后休眠那么长时间。
例如,当N=10时,样本输出
1 start
4 start
3 start
2 start
0 start
.25000
2 done
5 start
.30000
3 done
6 start
.35000
0 done
7 start
.40000
1 done
8 start
.40000
4 done
9 start
.05000
7 done
.20000
5 done
.25000
9 done
.45000
6 done
.50000
8 done
总共有30行。
但是对于大N,例如1000,结果可能会很奇怪。一次运行给出了2996行输出,其中998行带有start,999行带有done,999行带有float number。644和652在start中丢失,644在完毕。
这些测试在使用 bash 4.2.10(2) 的 Arch Linux 上运行。使用 bash 4.1.5(1) 在 debian stable 上也可以产生类似的结果。
编辑:我在 moreutils 和 GNU 并行中尝试了并行测试。moreutils 中的并行也有同样的问题。但是 GNU 并行工作得很好。
This following script is used for running parallel subprocess in bash,which is slightly changed from Running a limited number of child processes in parallel in bash?
#!/bin/bash
set -o monitor # means: run background processes in a separate processes...
N=1000
todo_array=($(seq 0 $((N-1))))
max_jobs=5
trap add_next_job CHLD
index=0
function add_next_job {
if [[ $index -lt ${#todo_array[@]} ]]
then
do_job $index &
index=$(($index+1))
fi
}
function do_job {
echo $1 start
time=$(echo "scale=0;x=$RANDOM % 10;scale=5;x/20+0.05" |bc);sleep $time;echo $time
echo $1 done
}
while [[ $index -lt $max_jobs ]] && [[ $index -lt ${#todo_array[@]} ]]
do
add_next_job
done
wait
The job is choosing a random number in 0.05:0.05:5.00 and sleep that much second.
For example, with N=10, a sample out put is
1 start
4 start
3 start
2 start
0 start
.25000
2 done
5 start
.30000
3 done
6 start
.35000
0 done
7 start
.40000
1 done
8 start
.40000
4 done
9 start
.05000
7 done
.20000
5 done
.25000
9 done
.45000
6 done
.50000
8 done
which has 30 lines in total.
But for big N such as 1000,the result can be strange.One run gives 2996 lines of ouput,with 998 lines with start ,999 with done ,and 999 with float number.644 and 652 is missing in start,644 is missing in done.
These test are runned on an Arch Linux with bash 4.2.10(2).Similar results can be produced on debian stable with bash 4.1.5(1).
EDIT:I tried parallel in moreutils and GNU parallel for this test.Parallel in moreutils has the same problem.But GNU parallel works perfect.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为这只是由于所有子进程继承相同的文件描述符并尝试并行附加到它。很少有两个进程竞争并且都在同一位置开始追加并且一个进程覆盖另一个进程。这本质上与其中一条评论所暗示的相反。
您可以通过管道重定向来轻松检查这一点,例如使用
your_script | tee 文件
,因为管道具有有关由小于特定大小的单个write()
调用传递的数据原子性的规则。SO 还有另一个与此类似的问题(我认为它只涉及两个线程都快速写入数字),其中也对此进行了解释,但我找不到它。
I think this is just due to all of the subprocesses inheriting the same file descriptor and trying to append to it in parallel. Very rarely two of the processes race and both start appending at the same location and one overwrites the other. This is essentially the reverse of what one of the comments suggests.
You could easily check this by redirecting through a pipe, such as with
your_script | tee file
because pipes have rules about atomicity of data delivered by singlewrite()
calls that are smaller than a particular size.There's another question on SO that's similar to this (I think it just involved two threads both quickly writing numbers) where this is also explained but I can't find it.
我唯一能想象的就是你的资源即将耗尽;检查“ulimit -a”并查找“最大用户进程”。如果这少于您想要生成的进程数,您最终会遇到错误。
尝试将您的用户(如果您不是以 root 身份运行)的限制设置为更高的限制。在 Redhatish 系统上,您可以通过以下方式执行此操作:
将该行添加到 /etc/pam.d/login:
将以下内容添加到 /etc/security/limits.conf:
其中“myuser”是被授予权限的用户名,1000 “最大用户进程数”的默认值和最大用户进程数 1024。软限制和硬限制不应相差太大。它只说明允许用户在 shell 中使用“ulimit”命令自行设置的内容。
因此,myuser 将以总共 1000 个进程(包括 shell、所有其他生成的进程)开始,但可以使用 ulimit 将其提高到 1024:
不需要重新启动,它会立即生效。
祝你好运!
亚历克斯.
The only thing I can imagine is that you're running out of resources; check "ulimit -a" and look for "max user processes". If that's less then the number of processes you want to spawn, you will end up with errors.
Try to set the limits for your user (if you're not running as root) to a higher limit. On Redhatish systems you can do this by:
Adding that line to /etc/pam.d/login:
Adding the following content to /etc/security/limits.conf:
where "myuser" is the username who is granted the right, 1000 the default value of "max user processes" and 1024 the maximum number of userprocesses. Soft- and hard-limit shouldn't be too much apart. It only says what the user is allowed to set himself using the "ulimit" command in his shell.
So the myuser will start with a total of a 1000 processes (including the shell, all other spawned processes), but may raise it to 1024 using ulimit:
A reboot is not required, it works instantly.
Good luck!
Alex.