有人有在 ClusterVisionOS 上运行集群的经验吗?
我目前正在使用 ClusterVisionOS 3.1 开发集群。这将是我第一次使用集群,所以我可能还没有尝试过“显而易见的”。
我可以使用“qsub”命令向集群提交一个作业(这我工作正常)
但是当一次提交多个作业时问题就开始了。我可以编写一个脚本一次性发送所有这些信息,但随后所有节点都会被我的作业占用,并且这里有更多的人想要提交他们的作业。
所以事情是这样的:
32 个节点(每个节点 4 个处理器/插槽)
最好的办法是告诉集群使用 3 个节点(12 个处理器)并将我的所有作业在这些节点/处理器上排队(如果可能的话)。如果我能让节点为每个作业使用 1 个处理器,那就完美了。
I'm currently working on a cluster using the ClusterVisionOS 3.1. This will be my first time working with a cluster, so I probably haven't tried the "obvious".
I can submit a single job to the cluster with the "qsub" command(this I got working properly)
But the problem starts when submitting multiple jobs at once. I could write a script sending them all at once, but then all nodes would be occupied with my jobs and there are more people here wanting to submit their job.
So here's the deal:
32 nodes (4 processors/slots each)
The best thing would be to tell the cluster to use 3 nodes (12 processors) and queue all my jobs on these nodes/processors, if this is even possible. If I could let the nodes use 1 processor for each job, then that would be perfect.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,所以我想我发现这个问题没有解决方案。我个人的解决方案是编写一个通过 ssh 连接到集群的脚本,然后让脚本检查有多少作业已经在您的用户名下运行。该脚本会检查该数量是否不超过(比如说)同时处理 20 个作业。只要没有达到这个数字,它就会继续提交作业。
也许这是一个丑陋的解决方案,但却是一个可行的解决方案!
关于处理器的事情,作业已经提交到不同的单个处理器,充分利用了节点的全部范围。
Ok, so i guess i found out, there is no solution to this problem. My personal solution is write a script that connects through ssh to the cluster and then just let the script check how many jobs are already running under your user name. The script checks if that number does not exceed, lets say, 20 jobs at the same time. As long as this number is not reached it keep submitting jobs.
Maybe its an ugly solution, but a working one!
About the processor thing, the jobs were already submitted to different single processors, fully utilizing the full extent of the nodes.