系统调用超时?
我正在使用 unix system() 调用 Gunzip 和 gzip 文件。对于非常大的文件,有时(即在集群计算节点上)这些文件会被中止,而其他时候(即在登录节点上)它们会通过。系统调用可能花费的时间是否有一些软限制?还能是什么?
I'm using unix system() calls to gunzip and gzip files. With very large files sometimes (i.e. on the cluster compute node) these get aborted, while other times (i.e. on the login nodes) they go through. Is there some soft limit on the time a system call may take? What else could it be?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
调用线程应该无限期地阻塞,直到您使用 system() 启动的任务完成。如果您观察到调用返回并且文件操作未完成,则表明生成的操作由于某种原因失败。
返回值表示什么?
The calling thread should block indefinitely until the task you initiated with system() completes. If what you are observing is that the call returns and the file operation as not completed it is an indication that the spawned operation failed for some reason.
What does the return value indicate?
几乎可以肯定,这不是使用 system() 的问题,而是您正在执行的操作的问题。始终检查返回值,但更重要的是,您需要查看正在调用的命令的输出。对于非交互式使用,通常最好将 stdout 和 stderr 写入日志文件。一种方法是编写一个包装脚本来检查底层命令、记录命令行、重定向 stdout 和 stderr(如果您想小心的话,还可以关闭 stdin),然后执行命令行。通过 system() 而不是直接运行操作系统命令来运行它。
我敢打赌,故障机器的磁盘空间有限,或者缺少目标文件或实际的 gzip/gunzip 命令。
Almost certainly not a problem with use of system(), but with the operation you're performing. Always check the return value, but even more so, you'll want to see the output of the command you're calling. For non-interactive use, it's often best to write stdout and stderr to log files. One way to do this is to write a wrapper script that checks for the underlying command, logs the commandline, redirects stdout and stderr (and closes stdin if you want to be careful), then execs the commandline. Run this via system() rather than the OS command directly.
My bet is that the failing machines have limited disk space, or are missing either the target file or the actual gzip/gunzip commands.
可能很愚蠢的问题:为什么不直接从应用程序中使用 zlib?
system() 不是系统调用。它是 fork()/exec()/wait() 的包装。检查 system() 手册页。如果它没有解除阻止,则可能是您的应用程序以某种方式干扰了 wait() - 例如,您是否有 SIGCHLD 处理程序?
Probably silly question: why not use zlib directly from your application?
And system() isn't a system call. It is a wrapper for fork()/exec()/wait(). Check the system() man page. If it doesn't unblock, it might be that your application interferes somehow with wait() - e.g. do you have a SIGCHLD handler?
如果是 Linux 系统,我建议使用 strace 来查看发生了什么以及哪个系统调用被阻止。
您甚至可以将 strace 附加到已经运行的进程:
# strace -p $PID
If it's a Linux system I would recommend using strace to see what's going on and which syscall blocks.
You can even attach strace to already running processes:
# strace -p $PID
听起来我遇到了同样的间歇性问题,表明某种超时。我的脚本每天都会运行。我开始相信 GZIP 已经超时了。
filename.txt.gz: 95.7% -- 替换为 filename.txt
详细信息:
我只需通过重试逻辑和一般脚本改进来解决它,但我希望下一个谷歌人知道他们并不疯狂。这种事也发生在其他人身上!
Sounds like I'm running into the same intermittent issue indicating a timeout of some kind. My script runs every day. I'm starting to believe GZIP has a timeout.
filename.txt.gz: 95.7% -- replaced with filename.txt
Details:
I'll simply be working around it with a retry logic and general scripting improvements, but I want the next google-er to know they're not crazy. This is happening to other people!