获得所需数据后,如何关闭 Python 2.5.2 Popen 子进程?
我正在运行以下版本的 Python:
$ /usr/bin/env python --version
Python 2.5.2
我正在运行以下 Python 代码以将数据从子进程写入标准输出,并将其读入名为 metadata
的 Python 变量:
# Extract metadata (snippet from extractMetadata.py)
inFileAsGzip = "%s.gz" % inFile
if os.path.exists(inFileAsGzip):
os.remove(inFileAsGzip)
os.symlink(inFile, inFileAsGzip)
extractMetadataCommand = "bgzip -c -d -b 0 -s %s %s" % (metadataRequiredFileSize, inFileAsGzip)
metadataPipes = subprocess.Popen(extractMetadataCommand, stdin=None, stdout=subprocess.PIPE, shell=True, close_fds=True)
metadata = metadataPipes.communicate()[0]
metadataPipes.stdout.close()
os.remove(inFileAsGzip)
print metadata
用例如下,从上述代码片段中提取前十行标准输出:
$ extractMetadata.py | head
如果我通过管道输入 head、awk、grep 等,则会出现错误。
脚本以以下错误结束:
close failed: [Errno 32] Broken pipe
我本以为关闭管道就足够了,但显然情况并非如此。
I am running the following version of Python:
$ /usr/bin/env python --version
Python 2.5.2
I am running the following Python code to write data from a child subprocess to standard output, and reading that into a Python variable called metadata
:
# Extract metadata (snippet from extractMetadata.py)
inFileAsGzip = "%s.gz" % inFile
if os.path.exists(inFileAsGzip):
os.remove(inFileAsGzip)
os.symlink(inFile, inFileAsGzip)
extractMetadataCommand = "bgzip -c -d -b 0 -s %s %s" % (metadataRequiredFileSize, inFileAsGzip)
metadataPipes = subprocess.Popen(extractMetadataCommand, stdin=None, stdout=subprocess.PIPE, shell=True, close_fds=True)
metadata = metadataPipes.communicate()[0]
metadataPipes.stdout.close()
os.remove(inFileAsGzip)
print metadata
The use case is as follows, to pull the first ten lines of standard output from the aforementioned code snippet:
$ extractMetadata.py | head
The error will appear if I pipe into head, awk, grep, etc.
The script ends with the following error:
close failed: [Errno 32] Broken pipe
I would have thought closing the pipes would be sufficient, but obviously that's not the case.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
从流程输出中获取前 10 行可能会以这种方式更好地工作:
Getting the first 10 lines from a process output might work better this way:
嗯。我之前见过 subprocess + gzip 的一些“Broken pipeline”奇怪现象。我从来没有弄清楚为什么会发生这种情况,但通过改变我的实现方法,我能够避免这个问题。看起来你只是想使用后端 gzip 进程来解压缩文件(可能是因为 Python 的内置模块慢得可怕......不知道为什么,但确实如此)。
您可以不使用
communicate()
,而是将进程视为完全异步的后端,并在输出到达时读取它的输出。当进程终止时,子进程模块将为您清理工作。以下片段应提供相同的基本功能,而不会出现任何损坏的管道问题。Hmmm. I've seen some "Broken pipe" strangeness with subprocess + gzip before. I never did figure out exactly why it was happening but by changing my implementation approach, I was able to avoid the problem. It looks like you're just trying to use a backend gzip process to decompress a file (probably because Python's builtin module is horrendously slow... no idea why but it definitely is).
Rather than using
communicate()
you can, instead, treat the process as a fully asynchronous backend and just read it's output as it arrives. When the process dies, the subprocess module will take care of cleaning things up for you. The following snippit should provide the same basic functionality without any broken pipe issues.我认为这个异常与 subprocess 调用及其文件描述符无关(调用 communicate 后,popen 对象被关闭)。这似乎是在管道中关闭 sys.stdout 的经典问题:
http:// bugs.python.org/issue1596
尽管这是一个 3 年前的错误,但尚未得到解决。由于 sys.stdout.write(...) 似乎也没有帮助,您可以求助于较低级别的调用,尝试一下:
I think this exception has nothing to do with the subprocess call nor its file descriptors (after calling communicate the popen object is closed). This seems to be the classic problem of closing
sys.stdout
in a pipe:http://bugs.python.org/issue1596
Despite being a 3-year old bug it has not been solved. Since
sys.stdout.write(...)
does not seem to help either, you may resort to a lower-level call, try this out:没有足够的信息来最终回答这个问题,但我可以做出一些有根据的猜测。
首先,
os.remove
绝对不应该因 EPIPE 失败。看起来也不像。错误是关闭失败:[Errno 32] 管道损坏
,而不是删除失败
。看起来close
失败了,而不是remove
失败了。关闭管道的标准输出可能会出现此错误。如果数据被缓冲,Python 将在关闭文件之前刷新数据。如果底层进程消失了,这样做会引发 IOError/EPIPE。但是,请注意,这不是致命错误:即使发生这种情况,文件仍然处于关闭状态。以下代码在大约 50% 的时间内重现了这一情况,并演示了该文件在异常后关闭。 (注意;我认为 bufsize 的行为在不同版本中已经发生了变化。)
这是很活泼的;它只在部分时间发生。这可以解释为什么删除或添加 os.remove 调用会影响错误。
也就是说,我看不出您提供的代码会如何发生这种情况,因为您没有写入标准输入。不过,这是我在没有可用的复制品的情况下可以获得的最接近的结果,也许它会为您指明正确的方向。
附带说明一下,在删除可能不存在的文件之前,不应检查 os.path.exists;如果另一个进程同时删除该文件,则会导致竞争条件。相反,这样做:
...我通常将其包装在像 rm_f 这样的函数中。
最后,如果您明确想要终止子进程,可以使用metadataPipes.kill——仅关闭其管道并不能做到这一点——但这无助于解释错误。另外,如果您只是读取 gzip 文件,那么使用 gzip 模块比使用子进程要好得多。 http://docs.python.org/library/gzip.html
There's not enough information to answer this conclusively, but I can make some educated guesses.
First,
os.remove
should definitely not be failing with EPIPE. It doesn't look like it is, either; the error isclose failed: [Errno 32] Broken pipe
, notremove failed
. It looks likeclose
is failing, notremove
.It's possible for closing a pipe's stdout to give this error. If data is buffered, Python will flush the data before closing the file. If the underlying process is gone, doing this will raise IOError/EPIPE. However, note that this isn't a fatal error: even when this happens, the file is still closed. The following code reproduces this about 50% of the time, and demonstrates that the file is closed after the exception. (Watch out; I think the behavior of bufsize has changed across versions.)
This is racy; it only happens part of the time. That may explain why it looked like removing or adding the
os.remove
call affects the error.That said, I can't see how this would happen with the code you've provided, since you don't write to stdin. It's the closest I can get without a usable repro, though, and maybe it'll point you in the right direction.
As a side note, you shouldn't check os.path.exists before deleting a file that may not exist; it'll cause race conditions if another process deletes the file at the same time. Instead, do this:
... which I usually wrap in a function like
rm_f
.Finally, if you explicitly want to kill a subprocess, there's
metadataPipes.kill
--just closing its pipes won't do that--but that doesn't help explain the error. Also, again, if you're just reading gzip files you're much better off with the gzip module than a subprocess. http://docs.python.org/library/gzip.html