运行调用使用 PBS 提交的外部 MPI 程序的并行 Python 线程
我对 python 很陌生,我不确定在分布式集群上实现多线程/多进程代码的最佳方法是什么。
我正在尝试使用 Python 编写一个包装器脚本,该脚本调用使用 PBS 排队系统在大型集群上运行的外部 MPI 程序。下面给出了我一直在处理的脚本类型的(非常)简化版本,其中代码移动到特定目录,运行外部 MPI 程序并检查结果以查看是否有任何大的更改。
#!/local/python-2.7.1/bin/python2.7
import os
import subprocess as sp
import coordinate_functions as coord_funcs
os.chdir('/usr/work/cmurray/SeachTest/')
print os.getcwd()
# Gets nodefile and num procs (NP)
cat_np = sp.Popen('cat $PBS_NODEFILE | wc -l', shell = True, stdout=sp.PIPE)
NP = int(cat_np.communicate()[0])
sp.call('cat $PBS_NODEFILE > nodefile', shell = True)
def run_mpi(np, nodefile):
mpi_cmd = 'mpirun -machinefile %s -np %d mpipg > calc.out' % (nodefile, np)
sp.call(vasp_cmd, shell = True)
def search_loop(calc_dir, t_total, nodefile, num_procs):
os.chdir(calc_dir)
no_events = True
while no_events or t < t_total:
run_mpi(mynodefile, NP)
num_events = coord_funcs.change_test('OUTFILE', 'INFILE', 0.01)
if num_events > 0:
event = True
else:
t += 1
search_loop('/usr/work/cmurray/SeachTest/calc_1/', 10, mynodefile, NP)
然后使用以下方法将其提交到队列:
qsub -l nodes=4 -N SeachTest ./SearchTest
我想要做的是在从列表读取的不同目录(例如包含不同的起始位置)中并行运行多个版本的 search_loop
函数。这些进程的 IO 量非常大,每次调用 MPI 计算时可能需要几分钟才能运行。
threading
模块可以用于此目的吗?还是 multiprocessing
模块是更好的选择?我可能需要在线程/进程之间传递简单的消息,例如上面示例中的事件布尔值。
另外,如何确保 python 脚本没有使用我分配给 MPI 运行的处理器?
I am pretty new to python and I'm unsure of what is the best way to implement a multithread/multiprocess code on a distributed cluster.
I am trying to write a wrapper script using Python that calls an external MPI programme running on a large cluster using a PBS queuing system. A (very) simplified version of type of script I've been working on is given below, where the code moves into a specific directory, runs an external MPI programme and checks the results to see if there have been any large changes.
#!/local/python-2.7.1/bin/python2.7
import os
import subprocess as sp
import coordinate_functions as coord_funcs
os.chdir('/usr/work/cmurray/SeachTest/')
print os.getcwd()
# Gets nodefile and num procs (NP)
cat_np = sp.Popen('cat $PBS_NODEFILE | wc -l', shell = True, stdout=sp.PIPE)
NP = int(cat_np.communicate()[0])
sp.call('cat $PBS_NODEFILE > nodefile', shell = True)
def run_mpi(np, nodefile):
mpi_cmd = 'mpirun -machinefile %s -np %d mpipg > calc.out' % (nodefile, np)
sp.call(vasp_cmd, shell = True)
def search_loop(calc_dir, t_total, nodefile, num_procs):
os.chdir(calc_dir)
no_events = True
while no_events or t < t_total:
run_mpi(mynodefile, NP)
num_events = coord_funcs.change_test('OUTFILE', 'INFILE', 0.01)
if num_events > 0:
event = True
else:
t += 1
search_loop('/usr/work/cmurray/SeachTest/calc_1/', 10, mynodefile, NP)
This is then submitted to the queue using:
qsub -l nodes=4 -N SeachTest ./SearchTest
What I want to do is run multiple versions of the search_loop
function in parallel in different directories (containing different starting positions for example) read from a list. The processes is very IO heavy with the MPI calculations taking maybe a few minutes to run each time they are called.
Would the threading
module be ok for this purpose or is the multiprocessing
module a better choice? I will probably need to pass simple messages like the event
boolean in the above example between threads/processes.
Also, how do I make sure that the python script is not using processors that I've assigned to the MPI runs?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我会首先为 I/O 密集型程序尝试多线程,假设有足够的带宽来实际并行化 I/O。
如果您不使用多处理,则由于全局解释器锁。
I'd try multithreading first for an I/O-intensive program, assuming that there's enough bandwidth to actually parallelize the I/O.
If you don't use multiprocessing, the script will only use a single CPU due to the Global Interpreter Lock.