Condor、Sun Grid Engine 还是其他?
我正在尝试弄清楚我们是否应该在工作中尝试Condor或Sun Grid Engine(或者可能是其他东西)。
我们经常有很多未使用的 WinXp 工作站。我们希望能够使用 LAN 唤醒,运行所有作业,然后自动关闭。我们主要运行 Matlab、Java 或 Python 模拟来进行蒙特卡罗或参数探索。
以我对 Condor 的有限了解,听起来使用 vm 宇宙可能是一种无需修改现有代码即可处理快照的便捷方法。
对于这种工作,SGE 或其他东西比 Condor 更好吗?
I'm trying to work out whether we should try out Condor or Sun Grid Engine at work (or possibly something else).
We often have lots of unused WinXp workstations. The hope is that we could use wake-on-LAN, run all our jobs, and then shut down automatically. We'd mainly be running Matlab, Java or Python simulations for either monte-carlo or parameter explorations.
With my limited knowledge of Condor, it sounds like using a the vm universe might be a convenient way of taking care of snapshots without having to modify existing code.
Is SGE or something else better than condor for this kind of work?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
SGE 并不真正支持Windows。它附带Windows 上的各种警告和缺失内容 。
我已经运行 Condor 池很多年了,它是一个出色的 HTPC 设置,适用于 Linux 和 Windows 机器上的循环窃取和专用、始终在线的硬件。最近添加的 Rooster 守护进程 让您可以在工作周期之间让机器进入睡眠状态,并在池中出现新工作时唤醒它们。他们还有一个活跃且非常有帮助的支持社区。检查点是 Condor 唯一在 Windows 上不可用的功能。其他一切都在那里。随着 VM Universe 的添加,检查点变得越来越没有用处。确实:要成功使用检查点,您需要能够重新链接整个代码堆栈。因此,如果您正在运行 Matlab 作业,即使在 Linux 上,检查点也是不可能的。
如果您对在 Windows 上运行 Condor 有具体问题,我很乐意回答这些问题,并分享我的经验。我在全球 4 个池中运行 Condor,所有池中总共有大约 1500 台专用计算机,还有大约 1000 台左右的额外桌面计算机,这些计算机可以在用户愿意捐赠时使用。
SGE doesn't really support windows. It comes with all kinds of caveats and missing bits on Windows.
I've been running Condor pools for many years now and it is a superb HTPC setup for both cycle-stealing and dedicated, always-on hardware, on Linux and Windows machines. The recent addition of their Rooster daemon lets you put machines to sleep between job cycles and wake them up when new work appears in the pool. They also have an active and very helpful support community. Checkpointing is the only Condor feature not available on Windows. Everything else is there. With the addition of the VM Universe, checkpointing is getting less and less useful. Really: to use checkpointing successfully you need to be able to relink your entire code stack. So if you're running Matlab jobs, even on Linux, checkpointing isn't going to be possible.
If you have specific questions about getting Condor running on Windows I'd be happy to answer them, share my experiences with it. I run Condor across 4 pools around the globe with a total of about 1500 dedicated machines in all the pools and some 1000 or so additional desktop machines that are available as users care to donate them.
我会从秃鹰开始。它对 Windows 具有良好的支持,并且较新的版本具有内置支持,当作业可以在某些计算机上运行时,以非常可配置的方式发送唤醒网络。它还可以根据用户定义的策略关闭机器。
I'd start with Condor. It has good support for Windows, and newer versions have built-in support for sending wake-on-lan in a very configurable way when jobs can run on certain machines. It can also shut the machines down based on user-defined policies.
Oracle 收购 SGE(Sun Grid Engine)后,Open Grid Scheduler 项目仍然提供开源 Grid Engine。
http://gridscheduler.sourceforge.net/
After Oracle's takeover of SGE (Sun Grid Engine), there is the Open Grid Scheduler project that still offers open-source Grid Engine.
http://gridscheduler.sourceforge.net/
对于专用硬件,我会选择 Grid Engine。
为了清除可能正在使用的机器上的时钟周期,我会选择 Condor。
对于您可以在固定时间段(例如夜间和周末)专门访问的硬件,我可能仍然会选择 Condor,但也许能够说服自己使用 Grid Engine。
For dedicated hardware I'd go with Grid Engine.
For scavenging clock cycles on machines which may be in use I'd go with Condor.
For hardware which you have dedicated access to for fixed periods, such as overnight and at weekends, I'd probably still go with Condor but might be able to persuade myself to use Grid Engine.
最近,我在一个客户项目中不得不在Condor 和SGE 之间做出选择。我倾向于 SGE(因为我更熟悉那个环境),但 Condor 最终获胜,因为:
但是,您无法在 Windows 上使用 Condor 最有趣的功能:检查点不可用,Condor 特定 IO 也不可用。我没有使用 VM 宇宙,所以我无法对此发表评论。
I've had to choose between condor and SGE for a customer project recently. I was favoring SGE (because I was more familiar with that environment), but Condor won finally because:
However, you cannot use the most interesting features of Condor on Windows : checkpointing is not available, nor the Condor specific IOs. I'm not using the VM universe, so I cannot comment on that aspect.
我只尝试过 Condor,尝试设置起来很痛苦。如果您需要可以充分利用的所有时钟周期,请选择 Condor。
我即将尝试 SGE,我会告诉你它是如何进行的。然而在我的公司,人们有设置 SGE 的经验,所以我可能会说 SGE 更容易。
I've only tried Condor, and it was a pain to attempt to set up. If you need all the clock cycles you can fully utiilize, go with Condor.
I'm about to try SGE, and I'll tell you how it goes. However at my company, people have had experience setting up SGE, so I'll probably say SGE is easier.
SGE 不存在……它是 OGE,而且非常昂贵。和秃鹰一起去吧。
SGE doesn't exist... it's OGE, and it's very expensive. Go with Condor.