Python程序使用os.pipe和os.fork()问题
我最近需要编写一个脚本来执行 os.fork() 分成两个进程。 子进程成为服务器进程,并使用 os.pipe() 创建的管道将数据传递回父进程。 像往常一样,子级关闭管道的 'r'
端,父级关闭管道的 'w'
端。 我使用 os.fdopen 将 pipeline() 的返回值转换为文件对象。
我遇到的问题是:进程成功分叉,并且子进程成为服务器。 一切都很好,孩子尽职尽责地将数据写入管道的开放 'w'
端。 不幸的是,管道的父端做了两件奇怪的事情:
A) 它会阻塞管道 'r'
端的 read()
操作。
其次,除非 'w'
端完全关闭,否则它无法读取放置在管道上的任何数据。
我立即认为缓冲是问题所在,并添加了 pipe.flush() 调用,但这些没有帮助。
谁能解释一下为什么在写入端完全关闭之前数据不会出现? 是否有策略使 read() 调用非阻塞?
这是我的第一个分叉或使用管道的 Python 程序,所以如果我犯了一个简单的错误,请原谅我。
I've recently needed to write a script that performs an os.fork() to split into two processes. The child process becomes a server process and passes data back to the parent process using a pipe created with os.pipe(). The child closes the 'r'
end of the pipe and the parent closes the 'w'
end of the pipe, as usual. I convert the returns from pipe() into file objects with os.fdopen.
The problem I'm having is this: The process successfully forks, and the child becomes a server. Everything works great and the child dutifully writes data to the open 'w'
end of the pipe. Unfortunately the parent end of the pipe does two strange things:
A) It blocks on the read()
operation on the 'r'
end of the pipe.
Secondly, it fails to read any data that was put on the pipe unless the 'w'
end is entirely closed.
I immediately thought that buffering was the problem and added pipe.flush() calls, but these didn't help.
Can anyone shed some light on why the data doesn't appear until the writing end is fully closed? And is there a strategy to make the read()
call non blocking?
This is my first Python program that forked or used pipes, so forgive me if I've made a simple mistake.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您是否使用 read() 而不指定大小,或将管道视为迭代器(
for line in f
)? 如果是这样,这可能是问题的根源 - read() 被定义为在返回之前读取直到文件末尾,而不是只读取可读取的内容。 这意味着它将阻塞,直到子进程调用 close()。在链接到的示例代码中,这是可以的 - 父级以阻塞方式运行,并且仅使用子级来实现隔离目的。 如果您想继续,则可以像您发布的代码中那样使用非阻塞 IO(但要准备好处理半完整的数据),或者分块读取(例如 r.read(size) 或 r.readline() ),它只会阻塞,直到读取特定大小/行。 (您仍然需要对子级调用刷新)
看起来将管道视为迭代器也使用了一些进一步的缓冲区,因为“
for line in r:
”可能不会给您什么如果您需要立即消耗每一行。 也许可以禁用此功能,但仅在 fdopen 中为缓冲区大小指定 0 似乎还不够。这是一些应该有效的示例代码:
Are you using read() without specifying a size, or treating the pipe as an iterator (
for line in f
)? If so, that's probably the source of your problem - read() is defined to read until the end of the file before returning, rather than just read what is available for reading. That will mean it will block until the child calls close().In the example code linked to, this is OK - the parent is acting in a blocking manner, and just using the child for isolation purposes. If you want to continue, then either use non-blocking IO as in the code you posted (but be prepared to deal with half-complete data), or read in chunks (eg r.read(size) or r.readline()) which will block only until a specific size / line has been read. (you'll still need to call flush on the child)
It looks like treating the pipe as an iterator is using some further buffer as well, for "
for line in r:
" may not give you what you want if you need each line to be immediately consumed. It may be possible to disable this, but just specifying 0 for the buffer size in fdopen doesn't seem sufficient.Heres some sample code that should work:
使用
fcntl.fcntl(readPipe, fcntl.F_SETFL, os.O_NONBLOCK)
在调用 read() 之前 解决了这两个问题。 read() 调用不再阻塞,并且数据仅在写入端执行 flash() 之后出现。
Using
fcntl.fcntl(readPipe, fcntl.F_SETFL, os.O_NONBLOCK)
Before invoking the read() solved both problems. The read() call is no longer blocking and the data is appearing after just a flush() on the writing end.
我看到你已经解决了阻塞 I/O 和缓冲的问题。
如果您决定尝试不同的方法,请注意:子进程相当于/替代了 fork/exec 习惯用法。 看起来这不是你正在做的事情:你只有一个 fork (不是 exec)并在两个进程之间交换数据 - 在这种情况下
multiprocessing
模块(在 Python 2.6+ 中)会更合适。I see you have solved the problem of blocking i/o and buffering.
A note if you decide to try a different approach: subprocess is the equivalent / a replacement for the fork/exec idiom. It seems like that's not what you're doing: you have just a fork (not an exec) and exchanging data between the two processes -- in this case the
multiprocessing
module (in Python 2.6+) would be a better fit.Python 应用程序中 fork 的“父”与“子”部分是愚蠢的。 这是 16 位 Unix 时代的遗产。 这是一种矫揉造作的感觉,当时 fork/exec 和 exec 是充分利用小型处理器的重要因素。
将 Python 代码分成两个独立的部分:父部分和子部分。
父部分应使用 subprocess 来运行子部分。
fork 和 exec 可能会发生在其中的某个地方——但你不需要关心。
The "parent" vs. "child" part of fork in a Python application is silly. It's a legacy from 16-bit unix days. It's an affectation from a day when fork/exec and exec were Important Things to make the most of a tiny little processor.
Break your Python code into two separate parts: parent and child.
The parent part should use subprocess to run the child part.
A fork and exec may happen somewhere in there -- but you don't need to care.