当前位置：文江博客话题详情

监控程序生命周期的最佳实践

发布于 2024-09-17 23:21:27 字数 328 浏览 7 评论 0原文

我想听听您对程序生命周期监控的看法。

这就是场景。您有一个正常工作的简单程序，这意味着它编写得很好，异常得到了处理等等。

如果您想确保该计划永远有效，您将如何操作？

没有像 crontab 这样的外部工具可用，但可以添加任何开销。

使用另一个持续“ping”主程序的程序？触摸文件并使用另一个程序检查文件修改？

您如何确保第二个程序始终有效？

所以，来吧，告诉我您对此的看法或最佳实践是什么！

作为脚注，我必须用 Python 编写这个程序，但这是一个通用问题！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

天邊彩虹 2024-09-24 23:21:27

在嵌入式系统中，经常做的是看门狗模块。

看门狗检查某个位置（可以是文件，可以是内存位置，等等），如果该位置不符合标准，则重新启动正在检查的系统。

因此，您可能会让您的程序处于探测状态，要做的就是定期写入一些带有纪元的programname_watchdog 文件。这将是常规循环的一部分。

然后你的看门狗（在完全不同的过程中）将检查该文件。如果列出的日期足够过时，则另一个程序将被终止并重新启动，因为它将被视为严重故障（挂起或崩溃）。请注意，您的看门狗将具有一些简单的逻辑，因此其失败的可能性要低得多。

我肯定还有其他方法可以实现这一目标。这只是一种方式。

编辑：您必须考虑构建系统的堆栈。外部依赖越多，失败的风险就越大。如果您正在寻找完美的操作，您还必须考虑程序正确性的正式证明。

问题实际上变成了您对系统的期望；什么样的故障是不可接受的，什么样的故障是可以预期的，以便您可以对其进行补偿。

这个问题很快就变成了一个证明硬件软件协同设计的问题（而且也很昂贵）。我很好奇你在做什么以及你的解决方案是什么。

回复收藏 0 原文

苹果你个爱泡泡 2024-09-24 23:21:27

就像保罗·内森（Paul Nathan）所说，使用看门狗。

不过，您可以采取一些措施来使事情变得更加健壮，例如：

int lastTick;

int RemoteProcessState()
{
    int tick = GetRemoteTick();

    if (tick == -1)
    {
        // Process recoverable error state.
        return -1;
    }

    if (tick == -2)
    {
        // Process unrecoverable error state.
        return -1;
    }

    if (tick < 0)
    {
        // Detect if the watchdog is overflowed.
                    return -1;
    }

    if (abs(abs(tick) - abs(lastTick)) > ALLOWED_PROCESS_LAG)
    {
        // Resynchronize process
    }
    else
    {
        // Process running normally.
    }

    return 0;
}

这是来自嵌入式 RTU 中用于过程控制的真实代码的伪代码示例。

它很原始，但很有效。这不仅可以确保远程进程处于活动状态，而且如果远程进程的计算速度发生漂移（扫描速率受程序大小和复杂性影响），它将确保两个进程仍然同步。

如果您需要更多数据，请开始研究 Modbus 使用的返回代码，或者 OPC 协议如何处理管理其质量字节。

Like Paul Nathan said, use a watchdog.

There are a few things you can do to make things more robust though, for example:

int lastTick;

int RemoteProcessState()
{
    int tick = GetRemoteTick();

    if (tick == -1)
    {
        // Process recoverable error state.
        return -1;
    }

    if (tick == -2)
    {
        // Process unrecoverable error state.
        return -1;
    }

    if (tick < 0)
    {
        // Detect if the watchdog is overflowed.
                    return -1;
    }

    if (abs(abs(tick) - abs(lastTick)) > ALLOWED_PROCESS_LAG)
    {
        // Resynchronize process
    }
    else
    {
        // Process running normally.
    }

    return 0;
}

That is a pseudeocode sample from real code used in a embedded RTU for process control.

Its primitive, but it works. Not only does this ensure that the remote process is alive, but if the remote process has drifted in calculation speed (scan rates are affected by program size and complexity) it will make sure that the two processes are still synchronized.

If you want more data, start investigating the return codes used by Modbus, or how the OPC protocol handles managing its Quality byte.

回复收藏 0 原文