多处理模块和 Pyro 的比较?

发布于 2024-07-27 20:52:54 字数 477 浏览 8 评论 0原文

我使用 pyro 来对计算集群上的并行作业进行基本管理。 我刚刚搬到一个集群,在那里我将负责使用每个计算节点上的所有核心。 (在以前的集群上,每个核心都是一个单独的节点。)python multiprocessing 模块似乎很适合这个。 我注意到它也可以用于远程进程通信。 如果有人使用这两个框架进行远程进程通信,我将很高兴听到它们如何相互比较。 多处理模块的明显好处是它是从 2.6 开始内置的。 除此之外,我很难说哪个更好。

I use pyro for basic management of parallel jobs on a compute cluster. I just moved to a cluster where I will be responsible for using all the cores on each compute node. (On previous clusters, each core has been a separate node.) The python multiprocessing module seems like a good fit for this. I notice it can also be used for remote-process communication. If anyone has used both frameworks for remote-process communication, I'd be grateful to hear how they stack up against each other. The obvious benefit of the multiprocessing module is that it's built-in from 2.6. Apart from that, it's hard for me to tell which is better.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

随遇而安 2024-08-03 20:52:54

编辑:我正在改变我的答案,这样你就可以避免痛苦。 多重处理还不成熟,BaseManager 上的文档不正确,如果您是一个面向对象的思考者,想要在运行时动态创建共享对象,使用 PYRO 否则您将真的很后悔!如果您只是使用共享队列进行函数式编程,那么您像所有愚蠢的示例一样预先注册,这对您有好处。

简短回答

多重处理:

  • 做面向对象的远程对象感觉很尴尬
  • 简单轻松的加密(authkey)
  • 通过网络或只是进程间通信
  • 没有名称服务器像 Pyro 那样的额外麻烦(有很多方法可以解决这个问题)
  • 编辑:
  • 编辑:如果服务器没有启动,客户端会抛出一些“无效参数”异常,而不仅仅是说“连接失败”WTF!?
  • 编辑: BaseManager 文档不正确! 没有“启动”方法!?!
  • 编辑:关于如何使用它的示例很少。

Pyro:

  • 简单的远程对象
  • 仅网络通信(如果仅本地则回送)
  • 编辑:这个东西只是有效,并且它喜欢面向对象的对象共享,这让我喜欢它
  • 编辑: > 为什么这不是标准库的一部分,而是试图复制它但惨败的多处理垃圾?

编辑:我第一次回答这个问题时,我刚刚开始研究 2.6 多重处理。 在我下面显示的代码中,Texture 类被注册并作为代理共享,但它内部的“data”属性不是。 所以猜猜会发生什么,每个进程在纹理代理内部都有一个单独的“数据”属性副本,尽管您可能期望如此。 我只是花了无数的时间试图弄清楚如何在运行时创建共享对象的良好模式,但我一直碰壁。 这是相当令人困惑和令人沮丧的。 也许只有我这么认为,但环顾四周人们尝试过的很少的例子,它看起来并不像这样。

我不得不做出痛苦的决定,放弃多处理库并更喜欢 Pyro,直到多处理更加成熟。 虽然最初我很高兴了解 python 中内置的多处理,但现在我对它感到非常厌恶,并且宁愿多次安装 Pyro 包,并庆幸 Python 存在如此美丽的库。

长答案

我在过去的项目中使用过 Pyro,并且对它非常满意。 我也开始使用 2.6 中新的多处理功能。

对于多处理,我发现允许根据需要创建共享对象有点尴尬。 看起来,在其年轻时期,多处理模块更适合函数式编程,而不是面向对象。 然而,这并不完全正确,因为这是可能的,我只是感觉受到“注册”调用的限制。

例如:

manager.py:

from multiprocessing import Process
from multiprocessing.managers import BaseManager

class Texture(object):
   def __init__(self, data):
        self.data = data

   def setData(self, data):
      print "Calling set data %s" % (data)
      self.data = data

   def getData(self):
      return self.data

class TextureManager(BaseManager):
   def __init__(self, address=None, authkey=''):
      BaseManager.__init__(self, address, authkey)
      self.textures = {}

   def addTexture(self, name, texture):
      self.textures[name] = texture

   def hasTexture(self, name):
      return name in self.textures

server.py:

from multiprocessing import Process
from multiprocessing.managers import BaseManager
from manager import Texture, TextureManager

manager = TextureManager(address=('', 50000), authkey='hello')

def getTexture(name):
   if manager.hasTexture(name):
      return manager.textures[name]
   else:
      texture = Texture([0]*100)
      manager.addTexture(name, texture)
      manager.register(name, lambda: texture)

TextureManager.register("getTexture", getTexture)


if __name__ == "__main__":
   server = manager.get_server()
   server.serve_forever()

client.py:

from multiprocessing import Process
from multiprocessing.managers import BaseManager
from manager import Texture, TextureManager

if __name__ == "__main__":
   manager = TextureManager(address=('127.0.0.1', 50000), authkey='hello')
   manager.connect()
   TextureManager.register("getTexture")
   texture = manager.getTexture("texture2")
   data = [2] * 100
   texture.setData(data)
   print "data = %s" % (texture.getData())

我所描述的尴尬来自于 server.py,我在其中注册了一个 getTexture 函数以从 TextureManager 中检索某个名称的函数。 当我讨论这个问题时,如果我将 TextureManager 设为创建/检索可共享纹理的可共享对象,那么这种尴尬可能会被消除。 嗯,我还在玩,但你明白了。 我不记得使用pyro遇到过这种尴尬,但可能有一个比上面的例子更干净的解决方案。

EDIT: I'm changing my answer so you avoid pain. multiprocessing is immature, the docs on BaseManager are INCORRECT, and if you're an object-oriented thinker that wants to create shared objects on the fly at run-time, USE PYRO OR YOU WILL SERIOUSLY REGRET IT! If you are just doing functional programming using a shared queue that you register up front like all the stupid examples GOOD FOR YOU.

Short Answer

Multiprocessing:

  • Feels awkward doing object-oriented remote objects
  • Easy breezy crypto (authkey)
  • Over a network or just inter-process communication
  • No nameserver extra hassle like in Pyro (there are ways to get around this)
  • Edit: Can't "register" objects once the manager is instantiated!!??
  • Edit: If a server isn't not started, the client throws some "Invalid argument" exception instead of just saying "Failed to connect" WTF!?
  • Edit: BaseManager documentation is incorrect! There is no "start" method!?!
  • Edit: Very little examples as to how to use it.

Pyro:

  • Simple remote objects
  • Network comms only (loopback if local only)
  • Edit: This thing just WORKS, and it likes object-oriented object sharing, which makes me LIKE it
  • Edit: Why isn't THIS a part of the standard library instead of that multiprocessing piece of crap that tried to copy it and failed miserably?

Edit: The first time I answered this I had just dived into 2.6 multiprocessing. In the code I show below, the Texture class is registered and shared as a proxy, however the "data" attribute inside of it is NOT. So guess what happens, each process has a separate copy of the "data" attribute inside of the Texture proxy, despite what you might expect. I just spent untold amount of hours trying to figure out how a good pattern to create shared objects during run-time and I kept running in to brick walls. It has been quite confusing and frustrating. Maybe it's just me, but looking around at the scant examples people have attempted it doesn't look like it.

I'm having to make the painful decision of dropping multiprocessing library and preferring Pyro until multiprocessing is more mature. While initially I was excited to learn multiprocessing being built into python, I am now thoroughly disgusted with it and would rather install the Pyro package many many times with glee that such a beautiful library exists for python.

Long Answer

I have used Pyro in past projects and have been very happy with it. I have also started to work with multiprocessing new in 2.6.

With multiprocessing I found it a bit awkward to allow shared objects to be created as needed. It seems like, in its youth, the multiprocessing module has been more geared for functional programming as opposed to object-oriented. However this is not entirely true because it is possible to do, I'm just feeling constrained by the "register" calls.

For example:

manager.py:

from multiprocessing import Process
from multiprocessing.managers import BaseManager

class Texture(object):
   def __init__(self, data):
        self.data = data

   def setData(self, data):
      print "Calling set data %s" % (data)
      self.data = data

   def getData(self):
      return self.data

class TextureManager(BaseManager):
   def __init__(self, address=None, authkey=''):
      BaseManager.__init__(self, address, authkey)
      self.textures = {}

   def addTexture(self, name, texture):
      self.textures[name] = texture

   def hasTexture(self, name):
      return name in self.textures

server.py:

from multiprocessing import Process
from multiprocessing.managers import BaseManager
from manager import Texture, TextureManager

manager = TextureManager(address=('', 50000), authkey='hello')

def getTexture(name):
   if manager.hasTexture(name):
      return manager.textures[name]
   else:
      texture = Texture([0]*100)
      manager.addTexture(name, texture)
      manager.register(name, lambda: texture)

TextureManager.register("getTexture", getTexture)


if __name__ == "__main__":
   server = manager.get_server()
   server.serve_forever()

client.py:

from multiprocessing import Process
from multiprocessing.managers import BaseManager
from manager import Texture, TextureManager

if __name__ == "__main__":
   manager = TextureManager(address=('127.0.0.1', 50000), authkey='hello')
   manager.connect()
   TextureManager.register("getTexture")
   texture = manager.getTexture("texture2")
   data = [2] * 100
   texture.setData(data)
   print "data = %s" % (texture.getData())

The awkwardness I'm describing comes from server.py where I register a getTexture function to retrieve a function of a certain name from the TextureManager. As I'm going over this the awkwardness could probably be removed if I made the TextureManager a shareable object which creates/retrieves shareable textures. Meh I'm still playing, but you get the idea. I don't remember encountering this awkwardness using pyro, but there probably is a solution that's cleaner than the example above.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文