运行调用使用 PBS 提交的外部 MPI 程序的并行 Python 线程

发布于 2024-12-15 12:55:28 字数 1529 浏览 2 评论 0原文

我对 python 很陌生,我不确定在分布式集群上实现多线程/多进程代码的最佳方法是什么。

我正在尝试使用 Python 编写一个包装器脚本,该脚本调用使用 PBS 排队系统在大型集群上运行的外部 MPI 程序。下面给出了我一直在处理的脚本类型的(非常)简化版本,其中代码移动到特定目录,运行外部 MPI 程序并检查结果以查看是否有任何大的更改。

#!/local/python-2.7.1/bin/python2.7

import os
import subprocess as sp
import coordinate_functions as coord_funcs

os.chdir('/usr/work/cmurray/SeachTest/')
print os.getcwd()

# Gets nodefile and num procs (NP)
cat_np = sp.Popen('cat $PBS_NODEFILE | wc -l', shell = True, stdout=sp.PIPE)
NP = int(cat_np.communicate()[0])
sp.call('cat $PBS_NODEFILE > nodefile', shell = True)

def run_mpi(np, nodefile):
        mpi_cmd = 'mpirun -machinefile %s -np %d mpipg > calc.out' % (nodefile, np)
        sp.call(vasp_cmd, shell = True)


def search_loop(calc_dir, t_total, nodefile, num_procs):

    os.chdir(calc_dir)
    no_events = True
    while no_events or t < t_total:
        run_mpi(mynodefile, NP)
        num_events = coord_funcs.change_test('OUTFILE', 'INFILE', 0.01)
        if num_events > 0:
            event = True
        else:
            t += 1

search_loop('/usr/work/cmurray/SeachTest/calc_1/', 10, mynodefile, NP)

然后使用以下方法将其提交到队列:

qsub -l nodes=4 -N SeachTest ./SearchTest

我想要做的是在从列表读取的不同目录(例如包含不同的起始位置)中并行运行多个版本的 search_loop 函数。这些进程的 IO 量非常大,每次调用 MPI 计算时可能需要几分钟才能运行。

threading 模块可以用于此目的吗?还是 multiprocessing 模块是更好的选择?我可能需要在线程/进程之间传递简单的消息,例如上面示例中的事件布尔值。

另外,如何确保 python 脚本没有使用我分配给 MPI 运行的处理器?

I am pretty new to python and I'm unsure of what is the best way to implement a multithread/multiprocess code on a distributed cluster.

I am trying to write a wrapper script using Python that calls an external MPI programme running on a large cluster using a PBS queuing system. A (very) simplified version of type of script I've been working on is given below, where the code moves into a specific directory, runs an external MPI programme and checks the results to see if there have been any large changes.

#!/local/python-2.7.1/bin/python2.7

import os
import subprocess as sp
import coordinate_functions as coord_funcs

os.chdir('/usr/work/cmurray/SeachTest/')
print os.getcwd()

# Gets nodefile and num procs (NP)
cat_np = sp.Popen('cat $PBS_NODEFILE | wc -l', shell = True, stdout=sp.PIPE)
NP = int(cat_np.communicate()[0])
sp.call('cat $PBS_NODEFILE > nodefile', shell = True)

def run_mpi(np, nodefile):
        mpi_cmd = 'mpirun -machinefile %s -np %d mpipg > calc.out' % (nodefile, np)
        sp.call(vasp_cmd, shell = True)


def search_loop(calc_dir, t_total, nodefile, num_procs):

    os.chdir(calc_dir)
    no_events = True
    while no_events or t < t_total:
        run_mpi(mynodefile, NP)
        num_events = coord_funcs.change_test('OUTFILE', 'INFILE', 0.01)
        if num_events > 0:
            event = True
        else:
            t += 1

search_loop('/usr/work/cmurray/SeachTest/calc_1/', 10, mynodefile, NP)

This is then submitted to the queue using:

qsub -l nodes=4 -N SeachTest ./SearchTest

What I want to do is run multiple versions of the search_loop function in parallel in different directories (containing different starting positions for example) read from a list. The processes is very IO heavy with the MPI calculations taking maybe a few minutes to run each time they are called.

Would the threading module be ok for this purpose or is the multiprocessing module a better choice? I will probably need to pass simple messages like the event boolean in the above example between threads/processes.

Also, how do I make sure that the python script is not using processors that I've assigned to the MPI runs?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

清音悠歌 2024-12-22 12:55:28

我想要做的是在从列表中读取的不同目录(例如包含不同的起始位置)中并行运行 search_loop 函数的多个版本。这些进程的 IO 量非常大,每次调用 MPI 计算时可能需要几分钟才能运行。

线程模块可以用于此目的还是多处理模块是更好的选择?我可能需要在线程/进程之间传递简单的消息,例如上面示例中的事件布尔值。

我会首先为 I/O 密集型程序尝试多线程,假设有足够的带宽来实际并行化 I/O。

此外,如何确保 python 脚本没有使用我分配给 MPI 运行的处理器?

如果您不使用多处理,则由于全局解释器锁。

What I want to do is run multiple versions of the search_loop function in parallel in different directories (containing different starting positions for example) read from a list. The processes is very IO heavy with the MPI calculations taking maybe a few minutes to run each time they are called.

Would the threading module be ok for this purpose or is the multiprocessing module a better choice? I will probably need to pass simple messages like the event boolean in the above example between threads/processes.

I'd try multithreading first for an I/O-intensive program, assuming that there's enough bandwidth to actually parallelize the I/O.

Also, how do I make sure that the python script is not using processors that I've assigned to the MPI runs?

If you don't use multiprocessing, the script will only use a single CPU due to the Global Interpreter Lock.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文