无法腌制<类型 “实例方法”>当使用多处理 Pool.map() 时类型>
我正在尝试使用 multiprocessing
的 Pool.map()
函数来同时分配工作。当我使用以下代码时,它工作正常:
import multiprocessing
def f(x):
return x*x
def go():
pool = multiprocessing.Pool(processes=4)
print pool.map(f, range(10))
if __name__== '__main__' :
go()
但是,当我以更面向对象的方法使用它时,它不起作用。它给出的错误消息是:
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup
__builtin__.instancemethod failed
当以下是我的主程序:
import someClass
if __name__== '__main__' :
sc = someClass.someClass()
sc.go()
并且以下是我的 someClass
类时,就会发生这种情况:
import multiprocessing
class someClass(object):
def __init__(self):
pass
def f(self, x):
return x*x
def go(self):
pool = multiprocessing.Pool(processes=4)
print pool.map(self.f, range(10))
任何人都知道问题可能是什么,或者有一个简单的解决方法吗?
I'm trying to use multiprocessing
's Pool.map()
function to divide out work simultaneously. When I use the following code, it works fine:
import multiprocessing
def f(x):
return x*x
def go():
pool = multiprocessing.Pool(processes=4)
print pool.map(f, range(10))
if __name__== '__main__' :
go()
However, when I use it in a more object-oriented approach, it doesn't work. The error message it gives is:
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup
__builtin__.instancemethod failed
This occurs when the following is my main program:
import someClass
if __name__== '__main__' :
sc = someClass.someClass()
sc.go()
and the following is my someClass
class:
import multiprocessing
class someClass(object):
def __init__(self):
pass
def f(self, x):
return x*x
def go(self):
pool = multiprocessing.Pool(processes=4)
print pool.map(self.f, range(10))
Anyone know what the problem could be, or an easy way around it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
问题在于,多处理必须腌制一些东西才能在进程之间吊起它们,而绑定方法是不可腌制的。解决方法(无论您是否认为它“简单”;-)是将基础结构添加到您的程序中,以允许对此类方法进行腌制,并将其注册到 copy_reg 标准库方法。
例如,Steven Bethard 对此主题的贡献 (接近线程的末尾)展示了一种完全可行的方法,允许通过 copy_reg 进行方法 pickling/unpickling。
The problem is that multiprocessing must pickle things to sling them among processes, and bound methods are not picklable. The workaround (whether you consider it "easy" or not;-) is to add the infrastructure to your program to allow such methods to be pickled, registering it with the copy_reg standard library method.
For example, Steven Bethard's contribution to this thread (towards the end of the thread) shows one perfectly workable approach to allow method pickling/unpickling via
copy_reg
.所有这些解决方案都很丑陋,因为除非您跳出标准库,否则多重处理和酸洗会被破坏和限制。
如果您使用名为
pathos.multiprocesssing
的multiprocessing
分支,则可以直接在 multiprocessing 的map
函数中使用类和类方法。这是因为使用dill
代替pickle
或cPickle
,并且dill
几乎可以序列化 python 中的任何内容。pathos.multiprocessing
还提供了一个异步映射函数...并且它可以map
具有多个参数的函数(例如map(math.pow, [1,2,3] , [4,5,6])
)请参阅:
multiprocessing 和 dill 可以一起做什么?
以及:
http://matthewrocklin.com/blog/work/2013/ 12/05/Parallelism-and-Serialization/
明确地说,您可以完全按照您一开始想做的事情进行操作,并且如果您愿意,也可以从解释器中执行此操作。
在这里获取代码:
https://github.com/uqfoundation/pathos
All of these solutions are ugly because multiprocessing and pickling is broken and limited unless you jump outside the standard library.
If you use a fork of
multiprocessing
calledpathos.multiprocesssing
, you can directly use classes and class methods in multiprocessing'smap
functions. This is becausedill
is used instead ofpickle
orcPickle
, anddill
can serialize almost anything in python.pathos.multiprocessing
also provides an asynchronous map function… and it canmap
functions with multiple arguments (e.g.map(math.pow, [1,2,3], [4,5,6])
)See:
What can multiprocessing and dill do together?
and:
http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
And just to be explicit, you can do exactly want you wanted to do in the first place, and you can do it from the interpreter, if you wanted to.
Get the code here:
https://github.com/uqfoundation/pathos
您还可以在
someClass()
中定义一个__call__()
方法,该方法调用someClass.go()
,然后传递someClass.go()
的实例code>someClass() 到池中。这个对象是可腌制的并且它工作得很好(对我来说)......You could also define a
__call__()
method inside yoursomeClass()
, which callssomeClass.go()
and then pass an instance ofsomeClass()
to the pool. This object is pickleable and it works fine (for me)...Steven Bethard 的解决方案存在一些限制:
当您将类方法注册为函数时,每次方法处理完成时都会令人惊讶地调用类的析构函数。因此,如果您的类有 1 个实例调用了 n 次其方法,则成员可能会在两次运行之间消失,并且您可能会收到一条消息
malloc: *** error for object 0x...:pointer being freed was not allocate
(例如打开成员文件)或调用纯虚拟方法,
在没有活动异常的情况下终止调用(这意味着我使用的成员对象的生命周期比我想象的要短)。我在处理 n 大于池大小时得到了这个。这是一个简短的示例:
输出:
__call__
方法并不那么等价,因为 [None,...] 是从结果中读取的:因此这两种方法都不令人满意...
Some limitations though to Steven Bethard's solution :
When you register your class method as a function, the destructor of your class is surprisingly called every time your method processing is finished. So if you have 1 instance of your class that calls n times its method, members may disappear between 2 runs and you may get a message
malloc: *** error for object 0x...: pointer being freed was not allocated
(e.g. open member file) orpure virtual method called,
(which means than the lifetime of a member object I used was shorter than what I thought). I got this when dealing with n greater than the pool size. Here is a short example :terminate called without an active exception
Output:
The
__call__
method is not so equivalent, because [None,...] are read from the results :So none of both methods is satisfying...
您可以使用另一个快捷方式,尽管它可能效率低下,具体取决于类实例中的内容。
正如大家所说,问题在于多处理代码必须对发送到已启动的子流程的内容进行pickle,而pickler 不执行实例方法。
但是,您可以将实际的类实例加上要调用的函数名称发送到普通函数,然后使用 getattr 来调用实例方法,而不是发送实例方法,从而在
Pool
子流程中创建绑定方法。这与定义 __call__ 方法类似,只不过您可以调用多个成员函数。从他的答案中窃取@EricH.的代码并对其进行一些注释(我重新输入了它,因此所有名称都发生了更改等,出于某种原因,这似乎比剪切和粘贴更容易:-))以说明所有的魔力:
输出显示,事实上,构造函数被调用一次(在原始 pid 中),析构函数被调用 9 次(每个副本调用一次 = 每个池工作进程根据需要调用 2 或 3 次,加上原始 pid 中的一次)过程)。这通常是可以的,就像在本例中一样,因为默认的pickler会复制整个实例并(半)秘密地重新填充它——在本例中,执行以下操作:
——这就是为什么即使析构函数在三个工作进程,每次都会从 1 倒数到 0——当然,这样仍然会遇到麻烦。如有必要,您可以提供自己的
__setstate__
:例如在本例中。
There's another short-cut you can use, although it can be inefficient depending on what's in your class instances.
As everyone has said the problem is that the
multiprocessing
code has to pickle the things that it sends to the sub-processes it has started, and the pickler doesn't do instance-methods.However, instead of sending the instance-method, you can send the actual class instance, plus the name of the function to call, to an ordinary function that then uses
getattr
to call the instance-method, thus creating the bound method in thePool
subprocess. This is similar to defining a__call__
method except that you can call more than one member function.Stealing @EricH.'s code from his answer and annotating it a bit (I retyped it hence all the name changes and such, for some reason this seemed easier than cut-and-paste :-) ) for illustration of all the magic:
The output shows that, indeed, the constructor is called once (in the original pid) and the destructor is called 9 times (once for each copy made = 2 or 3 times per pool-worker-process as needed, plus once in the original process). This is often OK, as in this case, since the default pickler makes a copy of the entire instance and (semi-) secretly re-populates it—in this case, doing:
—that's why even though the destructor is called eight times in the three worker processes, it counts down from 1 to 0 each time—but of course you can still get into trouble this way. If necessary, you can provide your own
__setstate__
:in this case for instance.
您还可以在
someClass()
中定义一个__call__()
方法,该方法调用someClass.go()
,然后传递someClass.go()
的实例code>someClass() 到池中。这个对象是可腌制的并且它工作得很好(对我来说)......You could also define a
__call__()
method inside yoursomeClass()
, which callssomeClass.go()
and then pass an instance ofsomeClass()
to the pool. This object is pickleable and it works fine (for me)...上面的 parisjohn 的解决方案对我来说效果很好。另外,代码看起来干净且易于理解。就我而言,有一些函数需要使用 Pool 来调用,所以我在下面修改了 parisjohn 的代码。我使
__call__
能够调用多个函数,并且函数名称在来自go()
的参数字典中传递:The solution from parisjohn above works fine with me. Plus the code looks clean and easy to understand. In my case there are a few functions to call using Pool, so I modified parisjohn's code a bit below. I made
__call__
to be able to call several functions, and the function names are passed in the argument dict fromgo()
:在这个简单的情况下,
someClass.f
没有从类继承任何数据,也没有向类附加任何内容,一个可能的解决方案是分离出f
,因此它可以被腌制:In this simple case, where
someClass.f
is not inheriting any data from the class and not attaching anything to the class, a possible solution would be to separate outf
, so it can be pickled:一个可能简单的解决方案是改用
multiprocessing.dummy
。这是多处理接口的基于线程的实现,在 Python 2.7 中似乎没有这个问题。我在这里没有太多经验,但这种快速导入更改允许我在类方法上调用 apply_async 。multiprocessing.dummy
上的一些好资源:https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.dummy
http://chriskiehl.com/article/parallelism-in-one-line/
A potentially trivial solution to this is to switch to using
multiprocessing.dummy
. This is a thread based implementation of the multiprocessing interface that doesn't seem to have this problem in Python 2.7. I don't have a lot of experience here, but this quick import change allowed me to call apply_async on a class method.A few good resources on
multiprocessing.dummy
:https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.dummy
http://chriskiehl.com/article/parallelism-in-one-line/
为什么不使用单独的功能?
Why not to use separate func?
我遇到了同样的问题,但发现有一个 JSON 编码器可用于在进程之间移动这些对象。
使用它来创建列表:
然后在映射函数中,使用它来恢复对象:
I ran into this same issue but found out that there is a JSON encoder that can be used to move these objects between processes.
Use this to create your list:
Then in the mapped function, use this to recover the object:
更新:截至撰写本文之日,namedTuples 是可以选择的(从 python 2.7 开始)
这里的问题是子进程无法导入对象的类 - 在本例中为类 P-,在本例中在多模型项目中,P 类应该在子进程使用的任何地方都是可导入的,
一个快速的解决方法是通过影响 globals() 来使其可导入
Update: as of the day of this writing, namedTuples are pickable (starting with python 2.7)
The issue here is the child processes aren't able to import the class of the object -in this case, the class P-, in the case of a multi-model project the Class P should be importable anywhere the child process get used
a quick workaround is to make it importable by affecting it to globals()
pathos.multiprocessing
对我有用。它有一个
pool
方法并序列化所有内容,这与multiprocessing
不同pathos.multiprocessing
worked for me.It has a
pool
method and serializes everything unlikemultiprocessing
甚至不需要安装完整的 pathos 包。
实际上,唯一需要的包是 dill (
pip install dill
),然后用 dill 覆盖多处理 Pickler:这个答案借自 https://stackoverflow.com/a/69253561/10686785
There is even no need of installing full pathos package.
Actually the only package needed is dill (
pip install dill
), and then override multiprocessing Pickler with the dill one:This answer was borrowed from https://stackoverflow.com/a/69253561/10686785