如何调试看门狗超时

发布于 2024-07-15 11:52:25 字数 393 浏览 5 评论 0原文

我的微控制器中有一个看门狗,如果它没有被踢,就会重置处理器。 我的应用程序运行良好一段时间,但最终会重置,因为看门狗没有被踢。 如果我单步执行该程序,它就可以正常工作。

有哪些方法可以调试这个?

编辑: 结论: 我发现 bug 的方法是看门狗面包屑。

我使用的 PIC 具有高 ISR 向量和低 ISR 向量。 High 向量用于处理 LED 矩阵,Low 向量用于处理计时器滴答声。 但我将两个 ISR 处理程序都放在高向量中。 因此,当我禁用 LED 矩阵 ISR 并且计时器滴答 ISR 需要服务时,处理器将陷入低 ISR 状态以处理计时器滴答,但计时器滴答处理程序不存在。

面包屑将我的搜索限制为处理 LED 矩阵的函数,特别是禁用 LED 矩阵中断。

I have a watchdog in my microcontroller that if it is not kicked, will reset the processor. My applications runs fine for a while but will eventually reset because the watchdog did not get kicked. If I step through the program it works fine.

What are some ways to debug this?

EDIT:
Conclusion:
The way I found my bug was the watchdog breadcrumbs.

I am using a PIC that has a high and low ISR vector. The High vector was suppose to handle the LED matrix and the Low vector was to handle the timer tick. But I put both ISR handlers in the high vector. So when I disabled the LED matrix ISR and the timer tick ISR needed service, the processor would be stuck in the low ISR to handle the timer tick, but the timer tick handler was not there.

The breadcrumbs limited my search down to the function that handled the led matrix and specifically disabling the LED matrix interrupt.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

站稳脚跟 2024-07-22 11:52:25

添加一个未初始化的全局变量,该变量在整个代码中设置为不同的值。 具体来说,在主要函数调用之前和之后进行设置。

在 main 的开头放置一个断点。

当处理器重置时,全局变量仍将具有最后设置的值。 不断添加这些“面包屑”以缩小问题函数的范围。

Add an uninitialized global variable that is set to different values throughout the code. Specifically, set it before and after major function calls.

Put a breakpoint at the beginning of main.

When the processor resets the global variable will still have the last value it was set to. Keep adding these "bread crumbs" to narrow down to the problem function.

清君侧 2024-07-22 11:52:25

当您连接调试器时,许多软件看门狗会自动禁用(以防止在调试器使应用程序停止时调试器重新启动)。

也就是说,这里有一些基础知识:

这是一个多线程应用程序吗? 您使用 RT 调度程序吗? 如果是这样,您的看门狗任务是否饥饿?

确保您的看门狗任务不会卡在任何事情上(挂起信号量、等待消息等)。 有时,函数可能会以您意想不到的方式阻塞; 例如,我现在正在使用一个 Linux 平台,我可以很容易地让 printf 阻塞。

如果它是单线程的,分析器可以帮助您识别计时问题。

如果这是一个新系统,请确保看门狗工作正常; 测试简单的代码,该代码仅命中 WD,然后陷入无限循环。

Many software watchdogs are automatically disabled when you attach a debugger (to prevent it from restarting while the debugger has the application halted).

That said, here are some basics:

Is this a multithreaded applications? Are you using a RT scheduler? If so, is your watchdog task starved?

Make sure your watchdog task can't be stuck on anything (pending semaphore, waiting for a message, etc). Sometimes, functions can block in ways you might not expect; for example, I have a Linux platform I'm working on right now where I can get printf to block quite easily.

If it's single threaded, a profiler may help you identify timing issues.

If this is a new system, make sure the watchdog works correctly; test simple code that just hits the WD and then sleeps in an infinite loop.

醉城メ夜风 2024-07-22 11:52:25

我使用基于状态的编程,我一直想使用的一个技巧是为二进制中的当前状态保留一个输出端口。 然后连接逻辑分析仪并查看状态变化的时序。 您可以在这里做类似的事情:按照 Robert 所说的操作,创建一个全局变量并在关键点更改其值 - 最好使用立即将端口的值设置为当前状态的函数(即 changeState(nextState); )进入踢狗函数时的状态,然后在离开该函数之前将其改回之前的状态。 您应该能够从哪些功能中看到它不会被踢,然后您就可以处理这些功能。

祝你好运,这听起来像是一个时间问题,而且很难解决。

I use state-based programming and a trick I've always wanted to employ was to reserve one output port for the current state in binary. Then hook up a logic analyzer and see the timings of the state changes. You could do something similar here: Do what Robert said and create a global variable and change its value at key points - preferably with a function that immediately sets the value of the port to the current state (ie changeState(nextState); ) Change the state when you enter the function that kicks the dog, then change it back to the previous state before you leave the function. You should be able to see from what functions it DOESN'T get kicked and then you can work on those.

Good luck, it sounds like a timing problem and those are tough to solve.

鸢与 2024-07-22 11:52:25

通常看门狗任务/线程以低优先级运行。 因此,如果看门狗没有被踢出,这应该是因为处理器正忙于做其他事情 - 可能是一些它不应该做的事情。

在处理器重置之前转储每个任务/线程的执行上下文(本地堆栈、调度状态等)非常有用。 如果有一点运气和努力,您将能够确定是什么阻止了看门狗任务启动计时器。

Usually the watchdog task/thread runs at a low priority. So if the watchdog isn't getting kicked, this should be because the processor is busy doing something else - probably something that it shouldn't be doing.

It would be really useful to dump out the execution context (local stack, scheduling state etc.) for each task/thread just before the processor resets. With a bit of luck and work, you'll be able to determine what is preventing the watchdog task from kicking the timer.

梦断已成空 2024-07-22 11:52:25

我会使用一个额外的输出引脚,在代码中的适当位置设置高然后低,以限制我正在寻找的范围。 然后我会在数字示波器或逻辑分析仪上跟踪它。 这相当于另一张海报提到的面包屑方法,但您将能够更好地与重置脉冲进行时间关联。

I'd use an extra output pin, set high then low at appropriate points in the code to limit the scope of where I'm looking. Then I'd trace it on a digital scope or logic analyzer. This is equivalent to the breadcrumbs method mentioned by another poster, but you'll be able to time correlate to the reset pulse much better.

闻呓 2024-07-22 11:52:25

您可以在代码中插入 while 循环,并在 while 循环内切换 LED。 这是检查主板是否正在重置的有效方法。

You can insert a while loop in your code and toggle an LED inside the while loop. This is the effective way to check if the board is resetting.

旧话新听 2024-07-22 11:52:25

对您所做的每个假设提出两次质疑:

  • 确保看门狗被踢出(我不知道处理器上的日志记录设施)。
  • 确保看门狗被踢时不会重置处理器。

并想知道“跨过”和独自奔跑之间有什么区别; 时间限制肯定很重要。

Question every assumption you make, twice:

  • Make sure the watchdog is kicked (I don't know the logging facilities on the processor).
  • Make sure the watchdog, when kicked, doesn't reset the processor.

And wonder what differences there are between 'stepping through' and running alone; timing constraints will surely matter.

空名 2024-07-22 11:52:25

您可以将 strace (选项 -p)附加到正在运行的进程,观察它何时停止写入打开 /dev/watchdog 的文件描述符。
您可以使用选项-e 过滤strace 输出。
有关详细信息,请参阅手册页。

You could attach strace (option -p) to your running process, watching when it stops writing to the file descriptor that opened /dev/watchdog.
You can filter strace output using option -e.
See the manual page for details.

情未る 2024-07-22 11:52:25

对于我的应用程序,扩展了已接受的优秀答案(https://stackoverflow.com/a/661900/371793)进一步采取了在看门狗复位后检查 RAM 的方法。 由于本文太长,无法发表评论,因此我将其添加为附加答案:

我们的应用程序以自定义引导加载程序为前缀,它提供了一些功能,例如 OTA 固件更新。 通过将此引导加载程序的初始堆栈指针设置为 RAM 末尾之前的 1 kb,RAM 中保留足够的堆栈来重建回溯,以防主固件被看门狗重置。 然后引导加载程序需要将重新启动原因识别为看门狗,并将 RAM 的最后 kb 复制到某个指定的闪存区域,以便可以从中恢复。

重建回溯有点麻烦,因为没有 PC 可以开始,所以在实践中应该(手动)从底部到顶部展开堆栈,并且可能需要一些有根据的猜测来定位应用程序所在的点停滞了。 (这不一定是看到陈旧数据并且展开失败的点!)

这种方法帮助我系统地查明仅偶尔发生的问题。 要在堆栈上记录额外的“面包屑”,只需插入以下内容:

__attribute((unused)) volatile uint32_t _state[4];
_state[0] = 0x57a11ed; // magic value to aid manual unwinding
_state[1] = RCC->CSR;
_state[2] = count++; // maybe we're in a runaway loop?
// etc.

在没有单独引导加载程序的应用程序中,初始堆栈指针可以设置为 RAM 末尾之前的 1 kb,仅在常规引导后更改为 RAM 末尾(这当然不是微不足道的!)。 然后,在看门狗复位的情况下,应用程序可以简单地存储/传输 RAM 的最后 kb 以进行离线分析。

Expanding on the excellent accepted answer (https://stackoverflow.com/a/661900/371793), for my application I took the approach of examining RAM after a watchdog reset a bit further. As this text is way too long for a comment, I'm adding this as an additional answer:

Our application is prefixed by a custom bootloader, that provides a few functions such as OTA firmware update. By setting the initial stack pointer for this bootloader to 1 kb before the end of RAM, enough of the stack remains in RAM to reconstruct a backtrace in case the main firmware is reset by the watchdog. The bootloader then needs to identify the reboot cause to be the watchdog, and copy the last kb of RAM to some designated flash area, from which it can be recovered.

Reconstructing the backtrace is a bit cumbersome, as there is no PC to start from, so in practice the stack should be (manually) unwound from the bottom to the top, and some educated guessing may be needed to locate the point at which the application stalled. (This is not necessarily the point at which stale data is seen and unwinding fails!)

This approach helped me to systematically pinpoint an issue that occurred only very sporadically. To log additional 'breadcrumbs' on the stack, simply insert things like:

__attribute((unused)) volatile uint32_t _state[4];
_state[0] = 0x57a11ed; // magic value to aid manual unwinding
_state[1] = RCC->CSR;
_state[2] = count++; // maybe we're in a runaway loop?
// etc.

In applications without a separate bootloader, the initial stack pointer could be set to 1 kb before the end of RAM, only to be changed to end of RAM after regular boot (which of course is non-trivial!). Then, in case of a watchdog reset, the application may simply store/transmit the last kb of RAM for offline analysis.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文