如何检查是什么引起write()syscall吊死? Python子流程停止工作一段时间后工作
以下是 Python 脚本正常运行时 strace 结果的样子。主进程 (412) 及其子进程都按预期工作。然而,过了一段时间,也许 40-60 分钟后,子进程开始失败,无法成功进行写入调用。
strace: Process 1693 attached
[pid 1693] close(4) = 0 <0.000028>
[pid 1693] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 4 <0.000077>
[pid 1693] write(1, "options ['Forex']\n", 18) = 18 <0.000048>
[pid 1693] openat(AT_FDCWD, "csv/forex_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000064>
[pid 1693] read(6, "{\"feature\": \"Stocks\", \"speed\": \""..., 348) = 347 <0.000034>
[pid 1693] read(6, "", 1) = 0 <0.000025>
[pid 1693] close(6) = 0 <0.000039>
[pid 1693] openat(AT_FDCWD, "feature_titles/forex.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000062>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0`\0\0\0 \4\3\0\0\0\37\356`"..., 4096) = 388 <0.000035>
[pid 412] close(5) = 0 <0.000064>
[pid 412] openat(AT_FDCWD, "csv/crypto_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000143>
[pid 412] read(5, "{\"feature\": \"Stocks\", \"speed\": \""..., 349) = 348 <0.000060>
[pid 412] read(5, "", 1) = 0 <0.000040>
[pid 412] close(5) = 0 <0.000097>
[pid 412] openat(AT_FDCWD, "./display_images/Crypto.ppm", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000125>
[pid 1693] close(6) = 0 <0.000027>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000052>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/stocks/down-1.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000154>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000073>
[pid 412] read(5, <unfinished ...>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\20\0\0\0\16\4\3\0\0\0\324\1\201"..., 4096) = 199 <0.000035>
[pid 412] <... read resumed> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 61440) = 50542 <0.000554>
[pid 412] read(5, "", 8192) = 0 <0.000047>
[pid 412] close(5) = 0 <0.000095>
[pid 1693] close(6) = 0 <0.000044>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/EUR.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000074>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 968 <0.000040>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/USD.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000146>
[pid 1693] read(7, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 984 <0.000038>
[pid 1693] close(7) = 0 <0.000056>
[pid 1693] close(6) = 0 <0.000042>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/stocks/up-1.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000068>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\20\0\0\0\16\4\3\0\0\0\324\1\201"..., 4096) = 192 <0.000039>
[pid 1693] close(6) = 0 <0.000045>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/USD.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000112>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 984 <0.000039>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/JPY.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000151>
[pid 1693] read(7, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 769 <0.000047>
[pid 1693] close(7) = 0 <0.000060>
[pid 1693] close(6) = 0 <0.000043>
[pid 1693] openat(AT_FDCWD, "./display_images/Forex.ppm", O_RDWR|O_CREAT|O_TRUNC|O_LARGEFILE|O_CLOEXEC, 0666) = 6 <0.000359>
[pid 1693] write(6, "P6\n507 32\n255\n", 14) = 14 <0.000104>
[pid 1693] write(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 48672) = 48672 <0.000269>
[pid 1693] close(6) = 0 <0.000400>
[pid 1693] +++ exited with 0 +++
[pid 412] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1693, si_uid=1, si_status=0, si_utime=5, si_stime=9} ---
[pid 412] close(4) = 0 <0.000064>
运行 Python 脚本大约 40-60 分钟后,以下是 strace 结果。 PID 2220 似乎被 write(1, "options ['Forex']\n", 18
系统调用卡住了?我如何检查导致它的原因 有没有
strace: Process 2220 attached
[pid 412] close(5) = 0 <0.000048>
[pid 412] openat(AT_FDCWD, "csv/crypto_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000163>
[pid 412] read(5, "{\"feature\": \"Stocks\", \"speed\": \""..., 349) = 348 <0.000071>
[pid 2220] close(4 <unfinished ...>
[pid 412] read(5, <unfinished ...>
[pid 2220] <... close resumed> ) = 0 <0.000083>
[pid 412] <... read resumed> "", 1) = 0 <0.000105>
[pid 2220] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_LARGEFILE|O_CLOEXEC <unfinished ...>
[pid 412] close(5 <unfinished ...>
[pid 2220] <... openat resumed> ) = 4 <0.000137>
[pid 412] <... close resumed> ) = 0 <0.000136>
[pid 412] openat(AT_FDCWD, "./display_images/Crypto.ppm", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000099>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000046>
[pid 2220] write(1, "options ['Forex']\n", 18 <unfinished ...>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000079>
[pid 412] read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 61440) = 50542 <0.000322>
[pid 412] read(5, "", 8192) = 0 <0.000067>
[pid 412] close(5) = 0 <0.000048>
[pid 2220] <... write resumed> ) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <9.479316>
[pid 2220] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=412, si_uid=1} ---
[pid 2220] +++ killed by SIGTERM +++
[pid 412] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=2220, si_uid=1, si_status=SIGTERM, si_utime=0, si_stime=1} ---
[pid 412] close(4) = 0 <0.000218>
办法让我检查导致写入调用挂起的原因?
Here's what the strace results look like for when the Python script is working properly. Both the main process (412) and its subprocesses are working as intended. However, after a while, maybe 40-60 minutes later, the subprocesses start to fail and cannot successfully make write calls.
strace: Process 1693 attached
[pid 1693] close(4) = 0 <0.000028>
[pid 1693] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 4 <0.000077>
[pid 1693] write(1, "options ['Forex']\n", 18) = 18 <0.000048>
[pid 1693] openat(AT_FDCWD, "csv/forex_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000064>
[pid 1693] read(6, "{\"feature\": \"Stocks\", \"speed\": \""..., 348) = 347 <0.000034>
[pid 1693] read(6, "", 1) = 0 <0.000025>
[pid 1693] close(6) = 0 <0.000039>
[pid 1693] openat(AT_FDCWD, "feature_titles/forex.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000062>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0`\0\0\0 \4\3\0\0\0\37\356`"..., 4096) = 388 <0.000035>
[pid 412] close(5) = 0 <0.000064>
[pid 412] openat(AT_FDCWD, "csv/crypto_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000143>
[pid 412] read(5, "{\"feature\": \"Stocks\", \"speed\": \""..., 349) = 348 <0.000060>
[pid 412] read(5, "", 1) = 0 <0.000040>
[pid 412] close(5) = 0 <0.000097>
[pid 412] openat(AT_FDCWD, "./display_images/Crypto.ppm", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000125>
[pid 1693] close(6) = 0 <0.000027>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000052>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/stocks/down-1.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000154>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000073>
[pid 412] read(5, <unfinished ...>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\20\0\0\0\16\4\3\0\0\0\324\1\201"..., 4096) = 199 <0.000035>
[pid 412] <... read resumed> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 61440) = 50542 <0.000554>
[pid 412] read(5, "", 8192) = 0 <0.000047>
[pid 412] close(5) = 0 <0.000095>
[pid 1693] close(6) = 0 <0.000044>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/EUR.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000074>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 968 <0.000040>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/USD.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000146>
[pid 1693] read(7, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 984 <0.000038>
[pid 1693] close(7) = 0 <0.000056>
[pid 1693] close(6) = 0 <0.000042>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/stocks/up-1.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000068>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\20\0\0\0\16\4\3\0\0\0\324\1\201"..., 4096) = 192 <0.000039>
[pid 1693] close(6) = 0 <0.000045>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/USD.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000112>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 984 <0.000039>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/JPY.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000151>
[pid 1693] read(7, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 769 <0.000047>
[pid 1693] close(7) = 0 <0.000060>
[pid 1693] close(6) = 0 <0.000043>
[pid 1693] openat(AT_FDCWD, "./display_images/Forex.ppm", O_RDWR|O_CREAT|O_TRUNC|O_LARGEFILE|O_CLOEXEC, 0666) = 6 <0.000359>
[pid 1693] write(6, "P6\n507 32\n255\n", 14) = 14 <0.000104>
[pid 1693] write(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 48672) = 48672 <0.000269>
[pid 1693] close(6) = 0 <0.000400>
[pid 1693] +++ exited with 0 +++
[pid 412] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1693, si_uid=1, si_status=0, si_utime=5, si_stime=9} ---
[pid 412] close(4) = 0 <0.000064>
About 40-60 minutes later of running the Python script, here are the strace results. PID 2220 seems to be stuck with the write(1, "options ['Forex']\n", 18 <unfinished ...>
syscall? How can I check what is causing it to hang?
strace: Process 2220 attached
[pid 412] close(5) = 0 <0.000048>
[pid 412] openat(AT_FDCWD, "csv/crypto_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000163>
[pid 412] read(5, "{\"feature\": \"Stocks\", \"speed\": \""..., 349) = 348 <0.000071>
[pid 2220] close(4 <unfinished ...>
[pid 412] read(5, <unfinished ...>
[pid 2220] <... close resumed> ) = 0 <0.000083>
[pid 412] <... read resumed> "", 1) = 0 <0.000105>
[pid 2220] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_LARGEFILE|O_CLOEXEC <unfinished ...>
[pid 412] close(5 <unfinished ...>
[pid 2220] <... openat resumed> ) = 4 <0.000137>
[pid 412] <... close resumed> ) = 0 <0.000136>
[pid 412] openat(AT_FDCWD, "./display_images/Crypto.ppm", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000099>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000046>
[pid 2220] write(1, "options ['Forex']\n", 18 <unfinished ...>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000079>
[pid 412] read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 61440) = 50542 <0.000322>
[pid 412] read(5, "", 8192) = 0 <0.000067>
[pid 412] close(5) = 0 <0.000048>
[pid 2220] <... write resumed> ) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <9.479316>
[pid 2220] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=412, si_uid=1} ---
[pid 2220] +++ killed by SIGTERM +++
[pid 412] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=2220, si_uid=1, si_status=SIGTERM, si_utime=0, si_stime=1} ---
[pid 412] close(4) = 0 <0.000218>
Is there a way for me to check what is causing the write call to be hanging? Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从
strace
输出来看,write
的目标似乎是进程的标准输出(文件描述符编号1
)。该文件描述符与管道关联。实际上,管道是一个可以写入和读取的固定大小的缓冲区。每当缓冲区已满或write
请求大于剩余缓冲区大小时,系统调用就会阻塞,直到另一方读取足够的数据来为写入请求腾出空间。只要缓冲区未满,
write
就不会阻塞。即使另一端没有读取任何数据,应用程序也会继续工作,直到剩余缓冲区大小对于写入请求来说太小。管道的实际缓冲区大小因操作系统而异。在 Linux 上,默认值为
16
页,在大多数平台上转换为65536
。缓冲区大小可以通过fcntl 来更改。通常,与管道相关的缓冲区大小不是问题,但确保读取数据以防止另一端阻塞才是问题。
Judging from the
strace
output, it looks like thewrite
targets the standard output of the process (file descriptor number1
). That file descriptor is associated with a pipe. Effectively, a pipe is a fixed-size buffer that can be written to and read from. Whenever the buffer is full or thewrite
request is larger than the remaining buffer size, the system call blocks until the other side reads enough data to make room for the write request.As long as the buffer is not full,
write
does not block. Even if the other end does not read any data, the application continues to work until the remaining buffer size is too small for thewrite
request.The actual buffer size of a pipe varies across operating systems. On Linux, the default is
16
pages, which translates to65536
on most platforms. The buffer size can be changed by means offcntl
.Usually, the buffer size associated with a pipe is not the issue but making sure that data is read to keep the other end from blocking is.