如何在不记录核心转储或错误的情况下调试意外的进程终止

发布于 2025-01-15 11:41:55 字数 850 浏览 2 评论 0原文

我有一组用 C++ 编写的多线程可执行文件,从 Ubuntu 计算机上的 crontab 运行,主要从 websocket 连接获取和处理数据。这些可执行文件中的每一个都在 while 循环中运行,这样如果可执行文件终止,它会立即再次运行。

每当我运行这些可执行文件时,它们往往会正常运行几个小时,但随后会意外终止(全部同时),此时上述 while 循环会导致它们重复启动、运行几秒钟,然后终止出乎意料的是,无限地重复这个循环。

没有生成核心文件(即使我设置了“ulimit -c unlimited”并使用“-g -ggdb”构建了可执行文件,因此它们确实在出现段错误时生成了核心文件)。另外,“dmesg”没有显示任何指示可执行文件重复终止/重新启动的内容,事实上 /var/log 中的日志似乎没有显示任何值得注意的内容,所以我认为它们没有因为 OOM 而被杀死我最初的猜测。还有充足的磁盘空间。

我该如何调试这样的问题?还有其他地方可以查找错误消息吗?

我忘了提及,也没有任何值得注意的内容被打印到 stdout/stderr 。另外,另一个奇怪的事情是,如果我终止包含与其中一个可执行文件相对应的 while 循环的脚本(不接触任何其他 while 循环),然后在终端上手动运行该脚本,则相应的可执行文件似乎可以正常运行,而无需终止,即使其他可执行文件仍在不断重新启动并立即终止。

我相信我已经将其范围缩小到与标准输出相关的内容。当我将 websocket 输出记录到 stdout 时,会发生持续重新启动和终止。当我删除该日志记录时,可执行文件不会再崩溃。

哦,所以当可执行文件将 websocket 输出打印到 stdout 时,它会将此输出传输到“taskset -c 0 gzip -c”,显然这些 gzip 由于某种原因而终止,我什至没有注意到。有什么想法可能是这样或者如何调试吗?

I have a set of multithreaded executables written in C++ being run from the crontab on an Ubuntu machine that primarily fetch and process data from websocket connections. Each of these executables are run in a while loop, such that if an executable terminates, it is run again immediately.

Whenever I run these executables, they tend to run fine for several hours, but then will unexpectedly terminate (all at the same time), at which point the aforementioned while loop causes them to repeatedly start, run for a few seconds, and then terminate unexpectedly, repeating this cycle ad infinitum.

There are no core files generated (even though I have set "ulimit -c unlimited" and built the executables with "-g -ggdb", so they do generate a core file upon segfault). Also, "dmesg" does not show anything indicating this repeated termination/restarting of the executables, and in fact none of the logs in /var/log seem to show anything of note, so I assume they were not killed due to OOM as per my initial guess. There is also plenty of disk space.

How can I debug an issue like this? Is there anywhere else I can look for error messages?

I forgot to mention that there is nothing of note being printed to stdout/stderr either. Also, another weird thing is that if I kill the script containing the while loop corresponding to one of the executables (without touching any of the other while loops) and then run that script manually on the terminal, the corresponding executable seems to run fine without termination, even as the other executables are still continually restarting and terminating instantly.

I believe I've narrowed it down to something related to stdout. When I log the websocket output to stdout, the continual restarting and termination happens. When I remove that logging, the executable doesn't crash anymore.

Oh so when the executable prints the websocket output to stdout, it pipes this output to "taskset -c 0 gzip -c", and apparently those gzips terminated for some reason and I didn't even notice. Any ideas why that might be or how to debug that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

Saygoodbye 2025-01-22 11:41:55

也许您可以尝试获取主 while 循环的 stderr 输出,以查看硬打印到控制台但未记录的内容。

如果是shell脚本,则在linux命令末尾追加>output.log 2>&1

如果没有,您可以尾部 /proc//fd/1 其中 是 Linux 进程 ID

Maybe you can try to get stderr output of the main while loop to see something that would be hard printed to the console but not logged.

If it is a shell script, append >output.log 2>&1 at the end of the linux command.

If not, you can tail /proc/<pid>/fd/1 where <pid> is the linux process id

别理我 2025-01-22 11:41:55

错误的 websocket 输入会导致您的服务器崩溃吗?

一种可能的解决方案是您可能需要考虑像兄弟这样的 ids(现在称为 zeek https://zeek.org/)捕获来自服务器的所有流量(您将需要克隆交换机上的流量并记录崩溃的准确时间戳。

记录崩溃活动后,请考虑通过导出和制作数据包来回放它(您可以复制十六进制并附加到客户端套接字,然后如果它再次崩溃,您可以进行调试。

Bad websocket inputs is crashing your server?

One possible solution u may want to consider a ids like bro(now called zeek https://zeek.org/) to capture all your traffic from your server( u will need to clone traffic on the switch and have record accurate timestamps of your crashes.

Once you have recorded the crash activity consider playing it back by exporting and crafting the packet(you could copy the hex and attach to client socket and just compile). If it crashes again you have a repro to debug.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文