如何编写进程池 bash shell
我有超过10个任务要执行,系统限制最多可以同时运行4个任务。
我的任务可以像这样开始: myprog taskname
如何编写 bash shell 脚本来运行这些任务。最重要的是,当一个任务完成后,脚本可以立即启动另一个任务,使正在运行的任务数始终保持为 4。
I have more than 10 tasks to execute, and the system restrict that there at most 4 tasks can run at the same time.
My task can be started like:
myprog taskname
How can I write a bash shell script to run these task. The most important thing is that when one task finish, the script can start another immediately, making the running tasks count remain 4 all the time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
使用 xargs:
详细信息
Use
xargs
:Details here.
我在考虑编写自己的进程池时偶然发现了这个线程,并且特别喜欢 Brandon Horsley 的解决方案,尽管我无法使信号正常工作,因此我从 Apache 中获得了灵感,并决定尝试使用 fifo 的预分叉模型:我的工作队列。
以下函数是工作进程在分叉时运行的函数。
您可以在 Github 上获取我的解决方案的副本。这是使用我的实现的示例程序。
希望这有帮助!
I chanced upon this thread while looking into writing my own process pool and particularly liked Brandon Horsley's solution, though I couldn't get the signals working right, so I took inspiration from Apache and decided to try a pre-fork model with a fifo as my job queue.
The following function is the function that the worker processes run when forked.
You can get a copy of my solution at Github. Here's a sample program using my implementation.
Hope this helps!
使用 GNU Parallel 你可以这样做:
如果你有 4 个核心,你甚至可以这样做:
来自 http://git.savannah.gnu.org/cgit/parallel.git/tree/README:
完全安装
GNU Parallel 的完全安装就像这样简单:
个人安装
如果您不是root 您可以将 ~/bin 添加到您的路径并安装在
~/bin 和 ~/share:
或者如果你的系统缺少“make”,你可以简单地复制 src/parallel
src/sem src/niceload src/sql 到路径中的目录。
最小化安装
如果您只需要并行并且没有安装“make”(可能是
系统是旧的或 Microsoft Windows):
测试安装
此后您应该能够执行以下操作:
这将并行发送 3 个 ping 数据包到 3 个不同的主机并打印
完成时的输出。
观看介绍视频以进行快速介绍:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Using GNU Parallel you can do:
If you have 4 cores, you can even just do:
From http://git.savannah.gnu.org/cgit/parallel.git/tree/README:
Full installation
Full installation of GNU Parallel is as simple as:
Personal installation
If you are not root you can add ~/bin to your path and install in
~/bin and ~/share:
Or if your system lacks 'make' you can simply copy src/parallel
src/sem src/niceload src/sql to a dir in your path.
Minimal installation
If you just need parallel and do not have 'make' installed (maybe the
system is old or Microsoft Windows):
Test the installation
After this you should be able to do:
This will send 3 ping packets to 3 different hosts in parallel and print
the output when they complete.
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
我使用内置功能找到了 A Foo Walks into a Bar... 博客 中提出的最佳解决方案众所周知的xargs工具
首先创建一个文件 commands.txt ,其中包含要执行的命令列表
,然后将其通过管道传输到 xargs,以便在 4 个进程池中执行:
您可以修改进程号
I found the best solution proposed in A Foo Walks into a Bar... blog using build-in functionality of well know xargs tool
First create a file commands.txt with list of commands you want to execute
and then pipe it to xargs like this to execute in 4 processes pool:
you can modify no of process
我建议编写四个脚本,每个脚本都串行执行一定数量的任务。然后编写另一个脚本来并行启动这四个脚本。例如,如果您有脚本 script1.sh、script2.sh、script3.sh 和 script4.sh,则可以有一个名为 headscript.sh 的脚本,如下所示。
I would suggest writing four scripts, each one of which executes a certain number of tasks in series. Then write another script that starts the four scripts in parallel. For instance, if you have scripts, script1.sh, script2.sh, script3.sh, and script4.sh, you could have a script called headscript.sh like so.
按照 @Parag Sardas' 的回答和此处链接的文档,您可能需要在
.bash_aliases 上添加一个快速脚本
。重新链接文档链接,因为它值得一读
即
./xargs-parallel.sh jobs.txt 4
最多 4 个进程从 jobs.txt 读取Following @Parag Sardas' answer and the documentation linked here's a quick script you might want to add on your
.bash_aliases
.Relinking the doc link because it's worth a read
I.e.
./xargs-parallel.sh jobs.txt 4
maximum of 4 processes read from jobs.txt你也许可以用信号做一些聪明的事情。
请注意,这只是为了说明概念,因此尚未经过彻底测试。
You could probably do something clever with signals.
Note this is only to illustrate the concept, and thus not thoroughly tested.
这个经过测试的脚本一次运行 5 个作业,并且一旦完成就会重新启动一个新作业(由于当我们收到 SIGCHLD 时睡眠 10.9 被终止。一个更简单的版本可以使用直接轮询(将睡眠 10.9 更改为睡一觉并摆脱陷阱)。
This tested script runs 5 jobs at a time and will restart a new job as soon as it does (due to the kill of the sleep 10.9 when we get a SIGCHLD. A simpler version of this could use direct polling (change the sleep 10.9 to sleep 1 and get rid of the trap).
关于 4 个 shell 脚本的其他答案并不完全令我满意,因为它假设所有任务大约花费相同的时间,并且因为它需要手动设置。但这是我将如何改进它。
主脚本将按照某些 namimg 约定创建到可执行文件的符号链接。例如,
第一个前缀用于排序,后缀标识批次(01-04)。
现在我们生成 4 个 shell 脚本,它们将批次号作为输入并执行类似以下操作
Other answer about 4 shell scripts does not fully satisfies me as it assumes that all tasks take approximatelu the same time and because it requires manual set up. But here is how I would improve it.
Main script will create symbolic links to executables following certain namimg convention. For example,
first prefix is for sorting and suffix identifies batch (01-04).
Now we spawn 4 shell scripts that would take batch number as input and do something like this
这是我的解决方案。这个想法很简单。我创建了一个
fifo
作为信号量,其中每一行代表一个可用资源。当读取
队列时,如果没有剩余内容,主进程就会阻塞。并且,在任务完成后,我们只需将任何内容echo
到队列中即可返回资源。上面的脚本将运行 10 个任务,每次同时运行 4 个任务。您可以将
$(seq 1 "${tasks}")
序列更改为您要运行的实际任务队列。Here is my solution. The idea is quite simple. I create a
fifo
as a semaphore, where each line stands for an available resource. Whenread
ing the queue, the main process blocks if there is nothing left. And, we return the resource after the task is done by simplyecho
ing anything to the queue.The script above will run 10 tasks and 4 each time concurrently. You can change the
$(seq 1 "${tasks}")
sequence to the actual task queue you want to run.我根据这个 Writing 中介绍的方法进行了修改Bash 中的进程池。
I made my modifications based on methods introduced in this Writing a process pool in Bash.
带有 -P 和 -L 选项的 xargs 可以完成这项工作。
您可以从下面的示例中提取这个想法:
xargs with -P and -L options does the job.
You can extract the idea from the example below:
看看我在 bash 中实现的作业池:
例如,要在从大量 URL 下载时最多运行 3 个 cURL 进程,您可以将 cURL 命令包装如下:
Look at my implementation of job pool in bash:
For example, to run at most 3 processes of cURL when downloading from a lot of URLs, you can wrap your cURL commands as follows: