Python - 类方法多进程安全吗?
我有一个类,它循环访问一些数据文件,处理它们,然后将新数据写回。每个文件的分析完全独立于其他文件。该类在其属性中包含分析所需的信息,但分析不需要更改该类的任何属性。因此,我可以将对一个数据文件的分析作为我的类的单一方法。由于每个数据文件是独立的,因此分析原则上可以并行完成。顺便说一句,我正在考虑让我的课程可迭代。
我可以使用多处理模块来生成作为我的类的方法的进程吗?我需要使用多重处理,因为我使用的第三方代码存在非常严重的内存泄漏(在大约 100 个数据文件后填满了所有 24Gb 内存)。
如果没有,你会怎样做呢?您会只使用我的类调用的普通函数(将我需要的所有信息作为参数传递)而不是方法吗?在多处理中如何将参数传递给函数?它会进行深复制吗?
I have a class that loops over some data files, processes them, and then writes new data back out. The analysis of each file is completely independent of the others. The class contains information needed by the analysis in its attributes, but the analysis does not need to change any attributes of the class. Thus I can make the analysis of one data file a single method of my class. The analysis could in principle be done in parallel since each data file is independent. As an aside, I was considering making my class iterable.
Can I use the multiprocessing module to spawn processes that are methods of my class? I need to use multiprocessing because I'm using third party code that has a really bad memory leak (fills up all 24Gb of memory after about 100 data files).
If not, how would you go about doing this? Would you just use a normal function called by my class (passing all the information I need as arguments) instead of a method? How are arguments passed to functions in multiprocessing? Does it make a deep copy?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,如果您不更新需要在实例之间共享的类本身的数据,那么在这种情况下,多重处理就是适合您的工具。
Yes, if you are not updating data on the class itself that needs to be shared across the instances, multiprocessing is the tool for you in this case.
您没有提到您的流程使用任何外部资源,因此它应该是 fork() 安全的。 Fork 复制内存和文件描述符,父级和子级中的程序状态相同。除非您使用的是无法分叉的 Windows,否则就使用它。
You're not mentioning your process using any external resources, so it should be fork()-safe. Fork duplicates the memory and file descriptors, program state is identical in the parent and the child. Unless you're using windows which can't fork, go for it.