python 线程:局部变量问题
我在使用线程和 scipy.stats.randint 模块时遇到一些问题。事实上,当启动多个线程时,本地数组(下面代码中的 bootIndexs )似乎用于所有启动的线程。
这是提出的错误
> Exception in thread Thread-559:
Traceback (most recent call last):
...
File "..\calculDomaine3.py", line 223, in bootThread
result = bootstrap(nbB, distMod)
File "...\calculDomaine3.py", line 207, in bootstrap
bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages)
File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 5014, in rvs
return super(rv_discrete, self).rvs(*args, **kwargs)
File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 582, in rvs
vals = reshape(vals, size)
File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 171, in reshape
return reshape(newshape, order=order)
ValueError: total size of new array must be unchanged
,这是我的代码:
import threading
import Queue
from scipy import stats as spstats
nbThreads = 4
def test(nbBoots, nbTirages, modules ):
def bootstrap(nbBootsThread, distribModules) :
distribMax = []
for j in range(nbBootsThread):
bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages)
boot = [distribModules[i] for i in bootIndexs]
distribMax.append(max(boot))
return distribMax
q = Queue.Queue()
def bootThread (nbB, distMod):
result = bootstrap(nbB, distMod )
q.put(result, False)
q.task_done()
works = []
for i in range(nbThreads) :
works.append(threading.Thread(target = bootThread, args = (nbBoots//nbThreads, modules[:],) ))
for w in works:
w.daemon = True
w.start()
q.join()
distMaxResult = []
for j in range(q.qsize()):
distMaxResult += q.get()
return distMaxResult
class classTest:
def __init__(self):
self.launch()
def launch(self):
print test(100, 1000, range(1000) )
感谢您的回答。
I have some trouble using threading and scipy.stats.randint module. Indeed, when several threads are launched, a local array (bootIndexs in the code below) seems to be used for all launched thread.
This is the raised Error
> Exception in thread Thread-559:
Traceback (most recent call last):
...
File "..\calculDomaine3.py", line 223, in bootThread
result = bootstrap(nbB, distMod)
File "...\calculDomaine3.py", line 207, in bootstrap
bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages)
File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 5014, in rvs
return super(rv_discrete, self).rvs(*args, **kwargs)
File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 582, in rvs
vals = reshape(vals, size)
File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 171, in reshape
return reshape(newshape, order=order)
ValueError: total size of new array must be unchanged
And this is my code :
import threading
import Queue
from scipy import stats as spstats
nbThreads = 4
def test(nbBoots, nbTirages, modules ):
def bootstrap(nbBootsThread, distribModules) :
distribMax = []
for j in range(nbBootsThread):
bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages)
boot = [distribModules[i] for i in bootIndexs]
distribMax.append(max(boot))
return distribMax
q = Queue.Queue()
def bootThread (nbB, distMod):
result = bootstrap(nbB, distMod )
q.put(result, False)
q.task_done()
works = []
for i in range(nbThreads) :
works.append(threading.Thread(target = bootThread, args = (nbBoots//nbThreads, modules[:],) ))
for w in works:
w.daemon = True
w.start()
q.join()
distMaxResult = []
for j in range(q.qsize()):
distMaxResult += q.get()
return distMaxResult
class classTest:
def __init__(self):
self.launch()
def launch(self):
print test(100, 1000, range(1000) )
Thanks for your answers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这就是线程的全部要点:轻量级任务与其生成过程共享所有内容! :) 如果您正在寻找不共享解决方案,那么您也许应该查看 多处理模块< /a> (不过,请记住,在系统上生成进程比生成线程要重得多)。
然而,回到你的问题......我的只是在黑暗中拍摄,但你可以尝试将这一行:更改
为:(
使用数组的副本而不是数组本身)。这似乎不太可能是问题(您只是迭代数组,而不是实际使用它),但这是当您在线程中使用它时我能看到的唯一一点,所以...
如果您的数组内容是,这当然有效不会被操作它的线程改变。如果更改“全局”数组的值是正确的行为,那么您应该相反地实现
Lock()
禁止同时访问该资源。然后你的线程应该做类似的事情:That's the entire point of threads: lightweight tasks that share everything with their spawning process! :) If you are looking for a share-nothing solution, than you should perhaps look at the multiprocessing module (keep in mind spawing a process is much heavier on the system than spawning a thread, though).
However, back to your problem... mine is little more than a shot in the dark, but you could try to change this line:
to:
(using a copy of the array rather than the array itself). This seems unlikely to be the issue (you are just iterating over the array, not actually using it), but is the only point I can see when you use it in your thread so...
This of course works if your array content is not to be changed by the threads manipulating it. If changing the value of the "global" array is the correct behaviour, then you should contrarily implement a
Lock()
to forbid simultaneous access to that resource. Your threads should then do something like:我没有线程方面的经验,所以这可能完全不合时宜。
scipy.stats.randint 与 scipy.stats 中的其他发行版一样,是相应发行版类的实例。这意味着每个线程都访问同一个实例。在 rvs 调用期间,会设置属性
_size
。如果具有不同大小的不同线程同时访问该实例,那么您将收到 ValueError ,表明大小在重塑中不匹配。对我来说这听起来像是竞争条件。我建议在这种情况下直接使用 numpy.random (这是 scipy.stats.randint 中的调用)
也许你在那里运气更好。
如果您需要 numpy.random 中不可用的分布,那么您将需要在每个线程中实例化分布的新实例(如果我的猜测是正确的)。
I have no experience with threading, so this might be completely off the mark.
scipy.stats.randint, as the other distributions in scipy.stats, is an instance of the corresponding distribution class. This means that every thread is accessing the same instance. During the rvs call an attribute
_size
is set. If a different thread with a different size accesses the instance in the meantime, then you would get the ValueError that the sizes don't match in the reshape. This sounds like the race condition to me.I would recommend to use numpy.random directly in this case (this is the call in scipy.stats.randint)
maybe you have better luck there.
If you need a distribution that is not available in numpy.random, then you would need to instantiate new instances of the distribution in each thread, if my guess is correct.