python 线程：局部变量问题

发布于 2024-12-22 08:24:17 字数 2048 浏览 0 评论 0原文

我在使用线程和 scipy.stats.randint 模块时遇到一些问题。事实上，当启动多个线程时，本地数组（下面代码中的 bootIndexs ）似乎用于所有启动的线程。

这是提出的错误

> Exception in thread Thread-559:
Traceback (most recent call last):
...
  File "..\calculDomaine3.py", line 223, in bootThread
    result = bootstrap(nbB, distMod)
  File "...\calculDomaine3.py", line 207, in bootstrap
    bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages)
  File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 5014, in rvs
    return super(rv_discrete, self).rvs(*args, **kwargs)
  File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 582, in rvs
    vals = reshape(vals, size)
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 171, in reshape
    return reshape(newshape, order=order)
ValueError: total size of new array must be unchanged

，这是我的代码：

import threading
import Queue
from scipy import stats as spstats

nbThreads = 4

def test(nbBoots, nbTirages,  modules ):

    def bootstrap(nbBootsThread, distribModules) :

         distribMax = []            

         for j in range(nbBootsThread): 
             bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages) 
             boot = [distribModules[i] for i in bootIndexs]

             distribMax.append(max(boot))

         return distribMax

    q = Queue.Queue()

    def bootThread (nbB, distMod):
        result = bootstrap(nbB, distMod )
        q.put(result, False)
        q.task_done()

    works = []

    for i in range(nbThreads) :     
        works.append(threading.Thread(target = bootThread, args = (nbBoots//nbThreads, modules[:],) ))


    for w in works:
        w.daemon = True
        w.start()

    q.join()

        distMaxResult = []

        for j in range(q.qsize()):
            distMaxResult += q.get()

        return distMaxResult

class classTest:
    def __init__(self):
        self.launch()

    def launch(self):
        print test(100, 1000, range(1000) )

感谢您的回答。

原文

I have some trouble using threading and scipy.stats.randint module. Indeed, when several threads are launched, a local array (bootIndexs in the code below) seems to be used for all launched thread.

This is the raised Error

> Exception in thread Thread-559:
Traceback (most recent call last):
...
  File "..\calculDomaine3.py", line 223, in bootThread
    result = bootstrap(nbB, distMod)
  File "...\calculDomaine3.py", line 207, in bootstrap
    bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages)
  File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 5014, in rvs
    return super(rv_discrete, self).rvs(*args, **kwargs)
  File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 582, in rvs
    vals = reshape(vals, size)
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 171, in reshape
    return reshape(newshape, order=order)
ValueError: total size of new array must be unchanged

And this is my code :

import threading
import Queue
from scipy import stats as spstats

nbThreads = 4

def test(nbBoots, nbTirages,  modules ):

    def bootstrap(nbBootsThread, distribModules) :

         distribMax = []            

         for j in range(nbBootsThread): 
             bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages) 
             boot = [distribModules[i] for i in bootIndexs]

             distribMax.append(max(boot))

         return distribMax

    q = Queue.Queue()

    def bootThread (nbB, distMod):
        result = bootstrap(nbB, distMod )
        q.put(result, False)
        q.task_done()

    works = []

    for i in range(nbThreads) :     
        works.append(threading.Thread(target = bootThread, args = (nbBoots//nbThreads, modules[:],) ))


    for w in works:
        w.daemon = True
        w.start()

    q.join()

        distMaxResult = []

        for j in range(q.qsize()):
            distMaxResult += q.get()

        return distMaxResult

class classTest:
    def __init__(self):
        self.launch()

    def launch(self):
        print test(100, 1000, range(1000) )

Thanks for your answers.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅语花开 2024-12-29 08:24:17

确实，当启动多个线程时，本地数组（下面代码中的 bootIndexs）似乎用于所有启动的线程。

这就是线程的全部要点：轻量级任务与其生成过程共享所有内容！ :) 如果您正在寻找不共享解决方案，那么您也许应该查看多处理模块< /a> （不过，请记住，在系统上生成进程比生成线程要重得多）。

然而，回到你的问题......我的只是在黑暗中拍摄，但你可以尝试将这一行：更改

boot = [distribModules[i] for i in bootIndexs]

为：（

boot = [distribModules[i] for i in bootIndexs.copy()]

使用数组的副本而不是数组本身）。这似乎不太可能是问题（您只是迭代数组，而不是实际使用它），但这是当您在线程中使用它时我能看到的唯一一点，所以...

如果您的数组内容是，这当然有效不会被操作它的线程改变。如果更改“全局”数组的值是正确的行为，那么您应该相反地实现 Lock() 禁止同时访问该资源。然后你的线程应该做类似的事情：

lock.acquire()
# Manipulate the array content here
lock.release()

Indeed, when several threads are launched, a local array (bootIndexs in the code below) seems to be used for all launched thread.

That's the entire point of threads: lightweight tasks that share everything with their spawning process! :) If you are looking for a share-nothing solution, than you should perhaps look at the multiprocessing module (keep in mind spawing a process is much heavier on the system than spawning a thread, though).

However, back to your problem... mine is little more than a shot in the dark, but you could try to change this line:

boot = [distribModules[i] for i in bootIndexs]

to:

boot = [distribModules[i] for i in bootIndexs.copy()]

(using a copy of the array rather than the array itself). This seems unlikely to be the issue (you are just iterating over the array, not actually using it), but is the only point I can see when you use it in your thread so...

This of course works if your array content is not to be changed by the threads manipulating it. If changing the value of the "global" array is the correct behaviour, then you should contrarily implement a Lock() to forbid simultaneous access to that resource. Your threads should then do something like:

lock.acquire()
# Manipulate the array content here
lock.release()

回复收藏 0 原文

破晓 2024-12-29 08:24:17

我没有线程方面的经验，所以这可能完全不合时宜。

scipy.stats.randint 与 scipy.stats 中的其他发行版一样，是相应发行版类的实例。这意味着每个线程都访问同一个实例。在 rvs 调用期间，会设置属性 _size。如果具有不同大小的不同线程同时访问该实例，那么您将收到 ValueError ，表明大小在重塑中不匹配。对我来说这听起来像是竞争条件。

我建议在这种情况下直接使用 numpy.random （这是 scipy.stats.randint 中的调用）

numpy.random.randint(min, max, self._size)

也许你在那里运气更好。

如果您需要 numpy.random 中不可用的分布，那么您将需要在每个线程中实例化分布的新实例（如果我的猜测是正确的）。

I have no experience with threading, so this might be completely off the mark.

scipy.stats.randint, as the other distributions in scipy.stats, is an instance of the corresponding distribution class. This means that every thread is accessing the same instance. During the rvs call an attribute _size is set. If a different thread with a different size accesses the instance in the meantime, then you would get the ValueError that the sizes don't match in the reshape. This sounds like the race condition to me.

I would recommend to use numpy.random directly in this case (this is the call in scipy.stats.randint)

numpy.random.randint(min, max, self._size)

maybe you have better luck there.

If you need a distribution that is not available in numpy.random, then you would need to instantiate new instances of the distribution in each thread, if my guess is correct.

回复收藏 0 原文

~没有更多了~