监控程序生命周期的最佳实践
我想听听您对程序生命周期监控的看法。
这就是场景。您有一个正常工作的简单程序,这意味着它编写得很好,异常得到了处理等等。
如果您想确保该计划永远有效,您将如何操作?
没有像 crontab 这样的外部工具可用,但可以添加任何开销。
使用另一个持续“ping”主程序的程序?触摸文件并使用另一个程序检查文件修改?
您如何确保第二个程序始终有效?
所以,来吧,告诉我您对此的看法或最佳实践是什么!
作为脚注,我必须用 Python 编写这个程序,但这是一个通用问题!
I want to hear your opinion about program life monitoring.
This is the scenario. You have a simple program which normally works, that means that it's well written, exception are handled and so on.
How will you operate if you want to ensure that this program works FOREVER?
No external tools like crontab are available, but any overhead can be added.
Using another program that continuously "pings" the main program? Touching a file and check with another program for the file modification?
And how do you assure that this second program always works?
So, come on, tell me which are your opinion or best practice in this context!
As footnote, I've to write this program in Python, but it's a general purpose question!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在嵌入式系统中,经常做的是看门狗模块。
看门狗检查某个位置(可以是文件,可以是内存位置,等等),如果该位置不符合标准,则重新启动正在检查的系统。
因此,您可能会让您的程序处于探测状态,要做的就是定期写入一些带有纪元的programname_watchdog 文件。这将是常规循环的一部分。
然后你的看门狗(在完全不同的过程中)将检查该文件。如果列出的日期足够过时,则另一个程序将被终止并重新启动,因为它将被视为严重故障(挂起或崩溃)。请注意,您的看门狗将具有一些简单的逻辑,因此其失败的可能性要低得多。
我肯定还有其他方法可以实现这一目标。这只是一种方式。
编辑:您必须考虑构建系统的堆栈。外部依赖越多,失败的风险就越大。如果您正在寻找完美的操作,您还必须考虑程序正确性的正式证明。
问题实际上变成了您对系统的期望;什么样的故障是不可接受的,什么样的故障是可以预期的,以便您可以对其进行补偿。
这个问题很快就变成了一个证明硬件软件协同设计的问题(而且也很昂贵)。我很好奇你在做什么以及你的解决方案是什么。
In embedded systems, what is often done is a watchdog module.
A watchdog checks some location (could be a file, could be a memory location, whatever), and restarts the system under examination if the location does not meet criteria.
So you might have your program under probe do is to write some programname_watchdog file with an epoch stamp periodically. This would be part of the regular loop.
Then your watchdog (in a totally different process) would check the file. If the date listed was sufficiently outdated, the other program would be killed and restarted, since it would be deemed to have critically malfunctioned(either hung or crashed). Note that your watchdog will have some simple logic, so its chances of failing are much lower.
I'm positive there are other ways to accomplish this as well. This is just one way.
edit: You have to consider the stack your system is built on. The more external dependencies you have, the more risk of failure. You also have to consider a formal proof of program correctness if you are looking for perfect operation.
The question really becomes what you are expecting from your system; what sort of failures are unacceptable and what sort of failures are expected so you can compensate for them.
This question becomes a proof-hardware-software co-design issue very fast (and expensive, too). I'm curious to see what you are doing and what your solution is.
就像保罗·内森(Paul Nathan)所说,使用看门狗。
不过,您可以采取一些措施来使事情变得更加健壮,例如:
这是来自嵌入式 RTU 中用于过程控制的真实代码的伪代码示例。
它很原始,但很有效。这不仅可以确保远程进程处于活动状态,而且如果远程进程的计算速度发生漂移(扫描速率受程序大小和复杂性影响),它将确保两个进程仍然同步。
如果您需要更多数据,请开始研究 Modbus 使用的返回代码,或者 OPC 协议如何处理管理其
质量
字节。Like Paul Nathan said, use a watchdog.
There are a few things you can do to make things more robust though, for example:
That is a pseudeocode sample from real code used in a embedded RTU for process control.
Its primitive, but it works. Not only does this ensure that the remote process is alive, but if the remote process has drifted in calculation speed (scan rates are affected by program size and complexity) it will make sure that the two processes are still synchronized.
If you want more data, start investigating the return codes used by Modbus, or how the OPC protocol handles managing its
Quality
byte.出色地。我对这个问题思考了很长时间,出现了两件事。
软件看门狗应该非常简单,以至于崩溃几乎是不可能的。对于狂热者来说,一个有趣的编程挑战可以是编写一个用不同语言编写的看门狗网络,这些看门狗必须彼此保持活动状态,并且一起监视主进程。
即使具有挑战性和有趣性,这似乎也是一个很大的时间浪费,而且场景看起来就像战争中的士兵。
其次,在我正在开发的应用程序中,我有一个硬件看门狗,它应该始终出现在关键操作中。
所以现在我的应用程序有一个软件看门狗,它可以刷新硬件并监视程序寿命。
最后,保罗,我完全同意你的观点。
Well. I've thought long over this problem, and 2 things have come up.
A Software Watchdog should be so simple that crashing should be nearby impossible. For maniac people, an interesting programming challenge can be write a net of watchdogs, written in different languages, which have to keep alive one with other and all together should monitor the main process.
Even if challenging and interesting, it seems a big waste of time, and the scenario look like soldiers in war.
Secondly, in the application I'm developing I've a Hardware watchdog, which should be always present in critical operation.
So now my application has a software watchdog which refresh the hardware one, and monitor the program life.
In the end, Paul, I completely agree with you.