C# 检测远程应用程序故障
有谁知道如何检测远程应用程序是否失败/崩溃? 我的意思是当它变得不可用时 - 在这种情况下,您通常会在标题栏中看到“未响应” - 但关键是应用程序仍在运行; 因此,仅仅发现进程不再运行是不够的。
WMI 不支持在远程计算机上使用 System.Diagnostics.Process.Responding.. 并且它们似乎没有我可以在 Win32_Process 中查询此类信息的其他 WMI 属性。
Does anyone know of a way to detect if a remote app has failed/crashed? I'm meaning when it becomes unusable - you'd usually see "Not Responding" in the title bar, in this case - but the key is that the app is still running; therefore just finding the process no longer running is not enough.
WMI does not support use of System.Diagnostics.Process.Responding on a remote machine.. and their seems to be no other WMI properties I can query in Win32_Process for this kind of information.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
很难知道应用程序是否已崩溃或实际上正在执行一些有用的操作。
考虑一下:
处理器(非常)繁忙。 如果这是在单独的线程中完成的,它甚至可能会做出响应。 然而,这确实是不受欢迎的行为,因为该应用程序不再工作了。
解决这个问题的最佳方法是定期(在软件的某些点上)添加某些计数器并广播它们。 看门狗应用程序可以侦听这些广播,如果它们没有到达或不再有意义(计数器不累加),那么您可以终止该进程并重新启动它。
广播可以通过多种方式完成。 最简单的方法是将计数器写入文件(确保文件在写入时被锁定,这样读取过程在同时读取文件时不会得到半损坏的文件)
更高级的方法是使用命名管道,或使用套接字。 在这种情况下,UDP 套接字非常容易设置和使用。 不用担心“丢包”,因为在本地网络上这种情况几乎不会发生
It is hard to know if an app has crashed or is actually doing something useful.
Consider this:
The processor is (very) busy. And it might even respond if this is done in a separate thread. However, this is really unwanted behaviour since the app is not working anymore.
Best way to tackle this is to periodically (on certain points in the software) add certain counters and broadcast these. A watchdog app can listen for these broadcasts and if they don't arrive or make sense anymore(counter does not add up) then you can kill the process and restart it.
Broadcasting can be done in multiple ways. Easiest is to just write the counters to a file (make sure the file is locked when you write in it so a reading process doesn't get a half mangled file when it is reading it at the exact same time)
more advanced ways is to use named pipes, or to use a socket. UDP socket is very easy to setup and use in this case. Don't worry about 'packetloss' since on a local network this almost never happens
您可以使用轮询机制并定期询问远程应用程序的状态。
You can use polling mechanism and periodically ask the status of the remote application.
在确定程序的“活跃度”时,重要的是要以有用的方式衡量定义程序活跃度的方面。
几种简单的“代理”方法由于其简单性而表面上很有吸引力,但从根本上并没有衡量重要的方面。
也许最常见的是“进程是否处于活动状态”和“单独的心跳广播线程”,可能是因为它很简单:
但是,如果应用程序中的实际工作线程锁定,那么这两种方法都有一个严重的缺陷(比如说进入无限循环或死锁)那么你将继续愉快地发送 OK 消息。 对于基于进程的监控,您将继续看到进程“活动”,尽管它不再执行其真正的任务。
您可以通过在主线程上分层测试进度来以多种方式改进线程一(显着增加复杂性和机会线程问题),但这采用了错误的解决方案并试图将其推向正确的解决方案。
最好的方法是使程序执行的任务成为活动检查的一部分。 也许在每个子任务完成后直接从主线程进行心跳检测(有一个阈值以确保它不会经常发生),或者只是查看输出(如果存在)并确保输入产生输出。
最好在内部(程序内)和外部(特别是程序有外部消费者/用户)进行验证。 如果您有网络服务器:尝试使用它,如果您的应用程序是基于事件循环的系统:触发它必须响应的事件(并验证输出是否正确)。 无论做什么,请始终考虑您希望验证是否正在发生有用且正确的行为,而不仅仅是任何活动。
您不仅验证程序是否存在,而且验证其操作越多,您的检查就越有用。 离内部状态越远,您就会检查更多的系统内容,如果您在盒子上运行监控进程,您可能只会检查本地环回,在盒子外运行会验证更多的网络堆栈,包括经常被遗忘的方面,例如 DNS 。
这不可避免地使检查变得更加困难,因为您本质上考虑的是特定任务而不是通用解决方案,因此所带来的好处应该会产生足够的好处,以便在许多情况下认真考虑这种方法。
In determining 'liveness' of a program it is important to measure that aspect the defines it being alive in a useful manner.
Several simple 'proxy' approaches are superficially appealing due to their simplicity but fundamentally do not measure the important aspect.
Perhaps the most common are the "Is the process alive" and "separate heartbeat broadcast thread" probably because it is so simple to do:
Both of these however have a serious flaw, if the real working thread(s) in your app lock up (say going into an infinite loop or a deadlock) then you will continue to merrily send out OK messages. For the process based monitoring you will continue to see the process 'alive' despite it no longer performing it's real task.
You can improve the thread one in many ways (significantly increasing the complexity and chance threading issues) by layering on tests for progress on the main thread but this takes the wrong solution and tries to push it towards the right one.
What is best is to make the task(s) performed by the program part of the liveness check. Perhaps to heartbeat directly from the main thread after every sub task done (with a threshold to ensure that it does not happen too often) or to simply look at the output (if it exists) and ensure that the inputs are resulting in outputs.
It is better still to validate this both internally (within the program) and externally (especially if there are external consumers/users of the program). If you have a web server: attempt to use it, if your app is some event loop based system: trigger events to which it must respond (and verify the output is correct). Whatever is done consider always that you wish to verify that useful and correct behaviour is occurring rather than just any activity at all.
The more you verify of not only the existence of the program, but it's actions the more useful your check will be. You will check more of the system the further you put yourself from the internal state, if you run your monitor process on the box you may only check local loopback, running off the box validates much more of the network stack including often forgotten aspects like DNS.
Inevitably this makes the checking harder to do, because you are inherently thinking about a specific task rather than a general solution, the dividends from this should yield sufficient benefits for this approach to be seriously considered in many cases.