什么会导致 exec 失败?接下来会发生什么?
exec(execl、execlp 等)失败的原因是什么?如果您调用 exec 并且它返回,除了恐慌和调用 exit 之外还有其他最佳实践吗?
What are the reasons that an exec (execl,execlp, etc.) can fail? If you make a call to exec and it returns, are there any best practices other than just panicking and calling exit?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
处理
exec
失败的问题在于,通常exec
是在子进程中执行的,而您希望在父进程中进行错误处理。但您不能只是exit(errno)
因为 (1) 您不知道错误代码是否适合退出代码,并且 (2) 您无法区分失败>exec
以及您exec
的新程序的失败退出代码。我所知道的最好的解决方案是使用管道来传达 exec 的成功或失败:
exec
失败,父级会读取错误代码并可以相应地继续。无论哪种方式,父进程都会阻塞,直到子进程调用exec
。The problem with handling
exec
failure is that usuallyexec
is performed in a child process, and you want to do the error handling in the parent process. But you can't justexit(errno)
because (1) you don't know if error codes fit in an exit code, and (2), you can't distinguish between failure toexec
and failure exit codes from the new program youexec
.The best solution I know is using pipes to communicate the success or failure of
exec
:exec
, since close-on-exec made successfulexec
close the writing end of the pipe. Or, ifexec
failed, the parent reads the error code and can proceed accordingly. Either way, the parent blocks until the child callsexec
.来自
exec(3 )
手册页:然后从
execve(2)
手册页:malloc()
的复杂性要低得多,并且仅使用ENOMEM
。来自malloc(3) 手册页
:From the
exec(3)
man page:And then from the
execve(2)
man page:malloc()
is a lot less complicated, and uses onlyENOMEM
. From themalloc(3) man page
:exec()
调用返回后执行的操作取决于上下文 - 程序应该执行的操作、错误是什么以及您可以采取哪些措施来解决该问题。麻烦的根源之一可能是您指定了一个简单的程序名而不是路径名;也许您可以使用
execvp()
重试,或者将命令转换为sh -c 'what you入门'
的调用。其中任何一个是否合理取决于应用程序。如果涉及重大安全问题,您可能不会再尝试。如果您指定了一个路径名并且存在问题(ENOTDIR、ENOENT、EPERM),那么您可能没有任何明智的回退,但您可以有意义地报告错误。
在过去(10 多年前),某些系统不支持“#!” shebang 表示法,如果您不确定正在执行可执行文件还是 shell 脚本,您可以将其作为可执行文件尝试,然后将其作为 shell 脚本重试。如果您正在运行 Perl 脚本,这可能会起作用,也可能不起作用,但在那些日子里,您编写 Perl 脚本来检测它们是否由 shell 运行,并使用 Perl 重新执行它们自己。幸运的是,那些日子已经过去了。
在可能的范围内,重要的是确保进程报告问题,以便可以追踪它 - 将其消息写入日志文件或仅写入 stderr (甚至可能
syslog()
),以便那些必须找出问题所在的人可以获得更多信息帮助他们,而不是不幸的最终用户的报告“我尝试了 X,但它不起作用”。至关重要的是,如果没有任何效果,则退出状态不为 0,因为 0 表示成功。即使这一点也可能会被忽略——但你已经尽力了。What you do after the
exec()
call returns depends on the context - what the program is supposed to do, what the error is, and what you might be able to do to work around the problem.One source of trouble could be that you specified a simple program name instead of a pathname; maybe you could retry with
execvp()
, or convert the command into an invocation ofsh -c 'what you originally specified'
. Whether any of these is reasonable depends on the application. If there are major security issues involved, probably you don't try again.If you specified a pathname and there is a problem with that (ENOTDIR, ENOENT, EPERM), then you may not have any sensible fallback, but you can report the error meaningfully.
In the old days (10+ years ago), some systems did not support the '#!' shebang notation, and if you were not sure whether you were executing an executable or a shell script, you tried it as an executable and then retried it as a shell script. That might or might not work if you were running a Perl script, but in those days, you wrote your Perl scripts to detect that they were being run by a shell and to re-exec themselves with Perl. Fortunately, those days are mostly over.
To the extent possible, it is important to ensure that the process reports the problem so that it can be traced - writing its message to a log file or just to stderr (or maybe even
syslog()
), so that those who have to work out what went wrong have more information to help them other than the hapless end user's report "I tried X and it didn't work". It is crucial that if nothing works, then the exit status is not 0 as that indicates success. Even that might be ignored - but you did what you could.除了恐慌之外,您还可以根据 errno 的值做出决定。
Other than just panicking, you could take a decision based on errno's value.
Exec 应该总是成功
(shell 除外,例如,如果用户输入了虚假命令)。
如果 exec 确实失败,则表明:
对于任何严重错误,通常的方法是将错误消息写入 stderr,然后以失败代码退出。几乎所有标准工具都这样做。对于 exec:
shell 也这样做(或多或少)。
通常,如果子进程失败,父进程也会失败并且应该退出。子进程是否在 exec 中失败或在运行程序时失败并不重要。如果 exec 失败,那么 exec 失败的原因并不重要。如果子进程由于任何原因失败,调用进程就会遇到麻烦并且需要停止。
不要浪费大量时间尝试预测所有可能的错误情况。不要编写试图以最佳方式处理每个错误代码的代码。您只会使代码变得臃肿,并引入许多新的错误。如果你的程序被破坏或者被滥用,它就会失败。如果你强迫它继续下去,将会带来更严重的麻烦。
例如,如果系统内存不足并且交换交换受到影响,我们不想一遍又一遍地尝试运行进程;这只会使情况变得更糟。如果我们收到文件系统错误,我们不想继续在该文件系统上运行;这可能会使腐败变得更加严重。如果程序安装错误,或者有错误,或者内存损坏,我们希望尽快停止,以免损坏的程序造成真正的损害(例如向客户端发送损坏的报告,破坏数据库,. ..)。
一种可能的替代方案:失败的进程可能会请求帮助,暂停自身 (SIGSTOP),然后在被告知继续时重试该操作。当系统内存不足、磁盘已满或者程序出现故障时,这可能会有所帮助。很少有手术如此昂贵和重要以至于值得这样做。
如果您正在制作交互式 GUI 程序,请尝试将其作为可重用命令行工具的薄包装器(如果出现问题则退出)。程序中的每个函数都应该可以通过 GUI、命令行以及函数调用来访问。写出你的函数。编写一些工具来为任何函数制作命令行和 GUI 包装器。也使用子流程。
如果你正在制作一个真正关键的系统,例如核电站的控制器,或者预测海啸的程序,那么你在读我的愚蠢建议做什么?关键系统不应完全依赖于计算机或软件。需要有一个“手动超驰”,有人来驾驶它。特别是,不要尝试在 MS Windows 上构建关键系统;这就像在水下建造沙堡一样。
Exec should always succeed
(except for shells, e.g. if the user entered a bogus command).
If exec does fail, it indicates:
For any serious error, the normal approach is to write the error message on stderr, then exit with a failure code. Almost all of the standard tools do this. For exec:
The shell does that, too (more or less).
Normally if a child process fails, the parent has failed too and should exit. It does not matter whether the child failed in exec, or while running the program. If exec failed, it does not matter why exec failed. If the child process failed for any reason, the calling process is in trouble and needs to stop.
Don't waste lots of time trying to anticipate all possible error conditions. Don't write code that tries to handle each error code in the best possible way. You'll just bloat the code, and introduce many new bugs. If your program is broken, or it's being abused, it should simply fail. If you force it to continue, worse trouble will come of that.
For example, if the system is out of memory and thrashing swap, we don't want to cycle over and over trying to run a process; it would just make the situation worse. If we get a filesystem error, we don't want to continue running on that filesystem; it might make the corruption worse. If the program was installed wrongly, or has a bug, or has memory corruption, we want to stop as soon as possible, before that broken program does some real damage (such as sending a corrupted report to a client, trashing a database, ...).
One possible alternative: a failing process might call for help, pause itself (SIGSTOP), then retry the operation if told to continue. This could help when the system is out of memory, or disks are full, or perhaps even if there is a fault in the program. Few operations are so expensive and important that this would be worthwhile.
If you're making an interactive GUI program, try to do it as a thin wrapper over reusable command-line tools (which exit if something goes wrong). Every function in your program should be accessible through the GUI, through the command-line, and as a function call. Write your functions. Write a few tools to make commmand-line and GUI wrappers for any function. Use sub-processes too.
If you are making a truly critical system, such as a controller for a nuclear power station, or a program to predict tsunamis, then what are you doing reading my dumb advice? Critical systems should not depend entirely on computers or software. There needs to be a 'manual override', with someone to drive it. Especially, do not attempt to build a critical system on MS Windows; that is like building sandcastles underwater.