mpi4py 与进程和线程

发布于 2024-10-31 05:55:34 字数 1642 浏览 1 评论 0原文

您好，这是一个非常具体的问题，所以我希望 StackOverflow 适用于所有编程语言，而不仅仅是 javascript/html

我正在 MPICH2（流行的消息传递接口）中编写一个多程序。我的程序是用 Python 编写的，因此我使用 MPI4Py Python 绑定。 MPI 最适合没有共享内存的情况，因此，它并不适合多核编程。为了使用 5 节点集群的全部 4 个核心，我进一步使用了线程。然而，我注意到使用线程实际上会减慢我的模拟速度。我的程序有几万行代码，所以我不能把它们全部列出来，但这是导致问题的代码片段

from threading import Thread
...
threadIndeces=[[0,10],[11,20],[21,30],[31,40]] #subset for each thread
for indeces in treadIndeces:
  t=Thread(target=foo,args=(indeces,))
  t.start()

另外，我确保稍后加入线程。如果我在没有线程的情况下运行它，并且仅使用所有索引调用 foo ，则速度大约快 10-15 倍。当我记录多线程版本的时间时，调用 t=Thread(target=foo,args=(indeces,)) 中线程的创建大约需要 0.05 秒，连接同样需要 0.05 秒秒，但 t.start() 调用花费了高达 0.2 秒的时间。

start() 是一个昂贵的调用吗？我应该改变我的方法吗？我考虑过保留一个线程池，而不是每次迭代都创建新线程，但这似乎并不像 t=Thread(target=foo,args=(indeces,)) 是导致速度变慢的原因。

另外，如果有人想知道 foo 的复杂性，这里是每次迭代都会被调用 i 次 indeces 的函数之一（非离散时间）：

def HD_training_firing_rate(HD_cell):
    """During training, the firing rate is governed by the difference between the 
       current heading direction and the preferred heading direction. This is made
       to resemble a Gaussian distribution
    """
    global fabs
    global exp
    global direction

    #loop over twice due to concurrent CW and CCW HD training
    for c in [0,1]:
        d=direction[c]
        dp=HD_cell.dp  #directional preferance
        s_d=20.0  #standard deviation
        s_i=min(fabs(dp-d),360-fabs(dp-d)) #circular deviation from preferred dir.

        HD_cell.r[c]=exp(-s_i*s_i/(2*s_d*s_d))  #normal distribution

原文

Hi This is a pretty specific question, so I hope StackOverflow is meant for all programming languages and not just javascript/html

I am writing a multi program in MPICH2 (popular message passing interface). My program is written in Python so I use the MPI4Py Python bindings. MPI is best for situations with no shared memory, therefore, it is not ideal for multicore programming. To use the full 4 cores of my 5 node cluster I am further using threads. However, I have noticed that using threads actually slows my simulation down. My program is several tens of thousands of lines of code, so I can not put it all up, but here is the snippet which is causing problems

from threading import Thread
...
threadIndeces=[[0,10],[11,20],[21,30],[31,40]] #subset for each thread
for indeces in treadIndeces:
  t=Thread(target=foo,args=(indeces,))
  t.start()

Also, I make sure to join the threads later. If I run it with no threads, and just call foo with all the indeces, it is about 10-15x times faster. When I record the times of the multithreaded version, the creation of the threads in the call t=Thread(target=foo,args=(indeces,)) takes around 0.05 seconds, the join similarly takes 0.05 seconds but the t.start() calls takes a whopping 0.2 seconds.

Is start() an expensive call? Should I be changing my approach? I thought about keeping a pool of threads rather than creating new ones every iteration, but it does not seem like the
t=Thread(target=foo,args=(indeces,)) is what's causing the slow down.

Also, incase anyone wants to know the complexity of the foo, here is one of the functions which gets called i times for the indeces every iteration (non discrete time):

def HD_training_firing_rate(HD_cell):
    """During training, the firing rate is governed by the difference between the 
       current heading direction and the preferred heading direction. This is made
       to resemble a Gaussian distribution
    """
    global fabs
    global exp
    global direction

    #loop over twice due to concurrent CW and CCW HD training
    for c in [0,1]:
        d=direction[c]
        dp=HD_cell.dp  #directional preferance
        s_d=20.0  #standard deviation
        s_i=min(fabs(dp-d),360-fabs(dp-d)) #circular deviation from preferred dir.

        HD_cell.r[c]=exp(-s_i*s_i/(2*s_d*s_d))  #normal distribution

分享到QQ

分享到微博