如何自动检测局域网中不繁忙的机器?
我正在编写一个在局域网上运行的 MPI 程序。任何学生都可以随时通过 ssh 连接到这些机器。
虽然我总是在晚上测试我的程序,但性能一直很不一致。我的猜测是,当我运行该程序时,某些节点正忙。
所以我的问题是:我可以编写一个脚本来检测不繁忙的机器并更新机器文件吗?有什么简单的写法吗?
多谢。
I'm writing an MPI program to be run over a local area network. These machines can be ssh'd to by any student at any time.
Although I always test my program at night, the performance has been very inconsistent. My guess is that some nodes were busy when I ran the program.
So my question is: can I write a script to detect non-busy machines and update the machine file? What's an easy way to write it?
Thanks a lot.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
通过 SSH 连接到每台计算机,然后读取 /proc/loadavg 文件或以其他方式确定“业务”。
SSH into each machine, then read the /proc/loadavg file or determine the "business" in some other way.
我认为最简单的方法是将 check_load[1] 脚本从 Nagios 安装到您想要检查的每个节点,并通过 ssh 使用一些合理的参数调用它:
CRITICAL 意味着“真的很忙”,警告可能是“有点忙”并且OK 意味着“机器处于空闲状态”。
您必须注意必须给出的警告和严重阈值,如 1/5/15 分钟;例如,一台 16 核机器的负载为 3 是完全可以的,而单核机器上的负载为 3 就意味着它真的很忙。
祝你好运!
亚历克斯.
[1] http://nagiosplugins.org/man/check_load
I think the easiest way would be installing the check_load[1] script from Nagios to every node you want to check and call it via ssh with some sensible parameters:
CRITICAL would mean "really busy", WARNING could be "is kinda busy" and OK would mean "the machine is idle".
You have to pay attention for the tresholds you have to give as 1/5/15 minute for warning and critical; for instance, a machine with 16 cores having a load of 3 is perfectly ok, while a load of 3 on a single-core machine would mean it's really really busy.
Good luck!
Alex.
[1] http://nagiosplugins.org/man/check_load