“令人尴尬地平行”在集群上使用 python 和 PBS 进行编程
我有一个可以生成数字的函数(神经网络模型)。我希望在带有 Torque 的标准集群上使用 PBS 来测试来自 python 的多个参数、方法和不同输入(意味着数百次函数运行)。
注意:我尝试了parallelpython、ipython 等,但从未完全满意,因为我想要更简单的东西。集群处于给定的配置中,我无法更改,这种集成 python + qsub 的解决方案肯定会给社区带来好处。
为了简化事情,我有一个简单的函数,例如:
import myModule
def model(input, a= 1., N=100):
do_lots_number_crunching(input, a,N)
pylab.savefig('figure_' + input.name + '_' + str(a) + '_' + str(N) + '.png')
其中 input
是表示输入的对象,input.name
是一个字符串,do_lots_number_crunching
> 可能会持续几个小时。
我的问题是:是否有正确的方法可以将诸如参数扫描之类的内容转换
for a in pylab.linspace(0., 1., 100):
model(input, a)
为“某些内容”,以便为每次调用 model
函数启动 PBS 脚本?
#PBS -l ncpus=1
#PBS -l mem=i1000mb
#PBS -l cput=24:00:00
#PBS -V
cd /data/work/
python experiment_model.py
我正在考虑一个包含 PBS 模板并从 python 脚本调用它的函数,但还无法弄清楚(装饰器?)。
I have a function (neural network model) which produces figures. I wish to test several parameters, methods and different inputs (meaning hundreds of runs of the function) from python using PBS on a standard cluster with Torque.
Note: I tried parallelpython, ipython and such and was never completely satisfied, since I want something simpler. The cluster is in a given configuration that I cannot change and such a solution integrating python + qsub will certainly benefit to the community.
To simplify things, I have a simple function such as:
import myModule
def model(input, a= 1., N=100):
do_lots_number_crunching(input, a,N)
pylab.savefig('figure_' + input.name + '_' + str(a) + '_' + str(N) + '.png')
where input
is an object representing the input, input.name
is a string, anddo_lots_number_crunching
may last hours.
My question is: is there a correct way to transform something like a scan of parameters such as
for a in pylab.linspace(0., 1., 100):
model(input, a)
into "something" that would launch a PBS script for every call to the model
function?
#PBS -l ncpus=1
#PBS -l mem=i1000mb
#PBS -l cput=24:00:00
#PBS -V
cd /data/work/
python experiment_model.py
I was thinking of a function that would include the PBS template and call it from the python script, but could not yet figure it out (decorator?).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
pbs_python[1] 可以用于此目的。如果experiment_model.py 'a'作为参数,你可以这样做
[1]: https:/ /oss.trac.surfsara.nl/pbs_python/wiki/TorqueUsage pbs_python
pbs_python[1] could work for this. If experiment_model.py 'a' as an argument you could do
[1]: https://oss.trac.surfsara.nl/pbs_python/wiki/TorqueUsage pbs_python
您可以使用 jug (我为类似的设置开发)轻松完成此操作。
您可以在文件中写入(例如,
model.py
):就是这样!
现在,您可以在队列上启动“jug jobs”:
jugexecute model.py
,这将自动并行化。发生的情况是,每个作业都会在一个循环中执行以下操作:(实际上比这更复杂,但你明白了)。
它使用文件系统进行锁定(如果您在 NFS 系统上)或使用 redis 服务器(如果您愿意)。它还可以处理任务之间的依赖关系。
这并不完全是您所要求的,但我相信将其与作业排队系统分开是一个更干净的架构。
You can do this easily using jug (which I developed for a similar setup).
You'd write in file (e.g.,
model.py
):And that's it!
Now you can launch "jug jobs" on your queue:
jug execute model.py
and this will parallelise automatically. What happens is that each job will in, a loop, do something like:(It's actually more complicated than that, but you get the point).
It uses the filesystem for locking (if you're on an NFS system) or a redis server if you prefer. It can also handle dependencies between tasks.
This is not exactly what you asked for, but I believe it's a cleaner architechture to separate this from the job queueing system.
看起来我来得有点晚了,但几年前我也有同样的问题,即如何将令人尴尬的并行问题映射到 python 中的集群上,并编写了自己的解决方案。我最近将其上传到github: https://github.com/plediii/pbs_util
编写你的程序使用 pbs_util,我首先会在工作目录中创建一个 pbs_util.ini,其中包含
然后像这样的 python 脚本
就可以了。
It looks like I'm a little late to the party, but I also had the same question of how to map embarrassingly parallel problems onto a cluster in python a few years ago and wrote my own solution. I recently uploaded it to github here: https://github.com/plediii/pbs_util
To write your program with pbs_util, I would first create a pbs_util.ini in the working directory containing
Then a python script like this
And that would do it.
我刚刚开始使用集群和 EP 应用程序。我的目标(我在图书馆工作)是学习足够的知识,帮助校园内的其他研究人员通过 EP 应用程序访问 HPC……尤其是 STEM 以外的研究人员。我还是个新手,但认为指出 GNU Parallel 在 PBS 脚本中启动带有不同参数的基本 python 脚本。在 .pbs 文件中,有两行需要指出:
作为 EP 超级计算的新手,即使我还不了解“并行”上的所有其他选项,该命令允许我与不同的并行启动 python 脚本参数。如果您可以提前生成大量参数文件来并行化您的问题,那么这将非常有效。例如,跨参数空间运行模拟。或者用相同的代码处理许多文件。
I just started working with clusters and EP applications. My goal (I'm with the Library) is to learn enough to help other researchers on campus access HPC with EP applications...especially researchers outside of STEM. I'm still very new, but thought it may help this question to point out the use of GNU Parallel in a PBS script to launch basic python scripts with varying arguments. In the .pbs file, there are two lines to point out:
As a newby to EP supercomputing, even though I don't yet understand all the other options on "parallel", this command allowed me to launch python scripts in parallel with different parameters. This would work well if you can generate a slew of parameter files ahead of time that will parallelize your problem. For example, running simulations across a parameter space. Or processing many files with the same code.