获得所需数据后,如何关闭 Python 2.5.2 Popen 子进程?

发布于 2024-09-26 06:38:58 字数 2805 浏览 5 评论 0原文

我正在运行以下版本的 Python:

$ /usr/bin/env python --version                                                                                                                                                            
Python 2.5.2                                    

我正在运行以下 Python 代码以将数据从子进程写入标准输出,并将其读入名为 metadata 的 Python 变量:

# Extract metadata (snippet from extractMetadata.py)
inFileAsGzip = "%s.gz" % inFile                                                                                                                                                                                                            
if os.path.exists(inFileAsGzip):                                                                                                                                                                                                           
    os.remove(inFileAsGzip)                                                                                                                                                                                                                
os.symlink(inFile, inFileAsGzip)                                                                                                                                                                                                           
extractMetadataCommand = "bgzip -c -d -b 0 -s %s %s" % (metadataRequiredFileSize, inFileAsGzip)                                                                                                                                            
metadataPipes = subprocess.Popen(extractMetadataCommand, stdin=None, stdout=subprocess.PIPE, shell=True, close_fds=True)                                                                                                      
metadata = metadataPipes.communicate()[0]                                                                                                                                                                                                                                                                                                                                                                                                          
metadataPipes.stdout.close()                                                                                                                                                                                                             
os.remove(inFileAsGzip) 
print metadata

用例如下,从上述代码片段中提取前十行标准输出:

$ extractMetadata.py | head

如果我通过管道输入 head、awk、grep 等,则会出现错误。

脚本以以下错误结束:

close failed: [Errno 32] Broken pipe

我本以为关闭管道就足够了,但显然情况并非如此。

I am running the following version of Python:

$ /usr/bin/env python --version                                                                                                                                                            
Python 2.5.2                                    

I am running the following Python code to write data from a child subprocess to standard output, and reading that into a Python variable called metadata:

# Extract metadata (snippet from extractMetadata.py)
inFileAsGzip = "%s.gz" % inFile                                                                                                                                                                                                            
if os.path.exists(inFileAsGzip):                                                                                                                                                                                                           
    os.remove(inFileAsGzip)                                                                                                                                                                                                                
os.symlink(inFile, inFileAsGzip)                                                                                                                                                                                                           
extractMetadataCommand = "bgzip -c -d -b 0 -s %s %s" % (metadataRequiredFileSize, inFileAsGzip)                                                                                                                                            
metadataPipes = subprocess.Popen(extractMetadataCommand, stdin=None, stdout=subprocess.PIPE, shell=True, close_fds=True)                                                                                                      
metadata = metadataPipes.communicate()[0]                                                                                                                                                                                                                                                                                                                                                                                                          
metadataPipes.stdout.close()                                                                                                                                                                                                             
os.remove(inFileAsGzip) 
print metadata

The use case is as follows, to pull the first ten lines of standard output from the aforementioned code snippet:

$ extractMetadata.py | head

The error will appear if I pipe into head, awk, grep, etc.

The script ends with the following error:

close failed: [Errno 32] Broken pipe

I would have thought closing the pipes would be sufficient, but obviously that's not the case.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

心凉怎暖 2024-10-03 06:38:59

从流程输出中获取前 10 行可能会以这种方式更好地工作:

ph = os.popen(cmdline, 'r')
lines = []
for s in ph:
    lines.append(s.rstrip())
    if len(lines) == 10: break
print '\n'.join(lines)
ph.close()

Getting the first 10 lines from a process output might work better this way:

ph = os.popen(cmdline, 'r')
lines = []
for s in ph:
    lines.append(s.rstrip())
    if len(lines) == 10: break
print '\n'.join(lines)
ph.close()
愿与i 2024-10-03 06:38:58

嗯。我之前见过 subprocess + gzip 的一些“Broken pipeline”奇怪现象。我从来没有弄清楚为什么会发生这种情况,但通过改变我的实现方法,我能够避免这个问题。看起来你只是想使用后端 gzip 进程来解压缩文件(可能是因为 Python 的内置模块慢得可怕......不知道为什么,但确实如此)。

您可以不使用 communicate(),而是将进程视为完全异步的后端,并在输出到达时读取它的输出。当进程终止时,子进程模块将为您清理工作。以下片段应提供相同的基本功能,而不会出现任何损坏的管道问题。

import subprocess

gz_proc = subprocess.Popen(['gzip', '-c', '-d', 'test.gz'], stdout=subprocess.PIPE)

l = list()
while True:
    dat = gz_proc.stdout.read(4096)
    if not d:
        break
    l.append(d)

file_data = ''.join(l)

Hmmm. I've seen some "Broken pipe" strangeness with subprocess + gzip before. I never did figure out exactly why it was happening but by changing my implementation approach, I was able to avoid the problem. It looks like you're just trying to use a backend gzip process to decompress a file (probably because Python's builtin module is horrendously slow... no idea why but it definitely is).

Rather than using communicate() you can, instead, treat the process as a fully asynchronous backend and just read it's output as it arrives. When the process dies, the subprocess module will take care of cleaning things up for you. The following snippit should provide the same basic functionality without any broken pipe issues.

import subprocess

gz_proc = subprocess.Popen(['gzip', '-c', '-d', 'test.gz'], stdout=subprocess.PIPE)

l = list()
while True:
    dat = gz_proc.stdout.read(4096)
    if not d:
        break
    l.append(d)

file_data = ''.join(l)
維他命╮ 2024-10-03 06:38:58

我认为这个异常与 subprocess 调用及其文件描述符无关(调用 communicate 后,popen 对象被关闭)。这似乎是在管道中关闭 sys.stdout 的经典问题:

http:// bugs.python.org/issue1596

尽管这是一个 3 年前的错误,但尚未得到解决。由于 sys.stdout.write(...) 似乎也没有帮助,您可以求助于较低级别的调用,尝试一下:

os.write(sys.stdout.fileno(), metadata)

I think this exception has nothing to do with the subprocess call nor its file descriptors (after calling communicate the popen object is closed). This seems to be the classic problem of closing sys.stdout in a pipe:

http://bugs.python.org/issue1596

Despite being a 3-year old bug it has not been solved. Since sys.stdout.write(...) does not seem to help either, you may resort to a lower-level call, try this out:

os.write(sys.stdout.fileno(), metadata)
原野 2024-10-03 06:38:58

没有足够的信息来最终回答这个问题,但我可以做出一些有根据的猜测。

首先,os.remove 绝对不应该因 EPIPE 失败。看起来也不像。错误是关闭失败:[Errno 32] 管道损坏,而不是删除失败。看起来close 失败了,而不是remove 失败了。

关闭管道的标准输出可能会出现此错误。如果数据被缓冲,Python 将在关闭文件之前刷新数据。如果底层进程消失了,这样做会引发 IOError/EPIPE。但是,请注意,这不是致命错误:即使发生这种情况,文件仍然处于关闭状态。以下代码在大约 50% 的时间内重现了这一情况,并演示了该文件在异常后关闭。 (注意;我认为 bufsize 的行为在不同版本中已经发生了变化。)

    import os, subprocess
    metadataPipes = subprocess.Popen("echo test", stdin=subprocess.PIPE,
        stdout=subprocess.PIPE, shell=True, close_fds=True, bufsize=4096)
    metadataPipes.stdin.write("blah"*1000)
    print metadataPipes.stdin
    try:
        metadataPipes.stdin.close()
    except IOError, e:
        print "stdin after failure: %s" % metadataPipes.stdin

这是很活泼的;它只在部分时间发生。这可以解释为什么删除或添加 os.remove 调用会影响错误。

也就是说,我看不出您提供的代码会如何发生这种情况,因为您没有写入标准输入。不过,这是我在没有可用的复制品的情况下可以获得的最接近的结果,也许它会为您指明正确的方向。

附带说明一下,在删除可能不存在的文件之前,不应检查 os.path.exists;如果另一个进程同时删除该文件,则会导致竞争条件。相反,这样做:

try:
    os.remove(inFileAsGzip)
except OSError, e:
    if e.errno != errno.ENOENT: raise

...我通常将其包装在像 rm_f 这样的函数中。

最后,如果您明确想要终止子进程,可以使用metadataPipes.kill——仅关闭其管道并不能做到这一点——但这无助于解释错误。另外,如果您只是读取 gzip 文件,那么使用 gzip 模块比使用子进程要好得多。 http://docs.python.org/library/gzip.html

There's not enough information to answer this conclusively, but I can make some educated guesses.

First, os.remove should definitely not be failing with EPIPE. It doesn't look like it is, either; the error is close failed: [Errno 32] Broken pipe, not remove failed. It looks like close is failing, not remove.

It's possible for closing a pipe's stdout to give this error. If data is buffered, Python will flush the data before closing the file. If the underlying process is gone, doing this will raise IOError/EPIPE. However, note that this isn't a fatal error: even when this happens, the file is still closed. The following code reproduces this about 50% of the time, and demonstrates that the file is closed after the exception. (Watch out; I think the behavior of bufsize has changed across versions.)

    import os, subprocess
    metadataPipes = subprocess.Popen("echo test", stdin=subprocess.PIPE,
        stdout=subprocess.PIPE, shell=True, close_fds=True, bufsize=4096)
    metadataPipes.stdin.write("blah"*1000)
    print metadataPipes.stdin
    try:
        metadataPipes.stdin.close()
    except IOError, e:
        print "stdin after failure: %s" % metadataPipes.stdin

This is racy; it only happens part of the time. That may explain why it looked like removing or adding the os.remove call affects the error.

That said, I can't see how this would happen with the code you've provided, since you don't write to stdin. It's the closest I can get without a usable repro, though, and maybe it'll point you in the right direction.

As a side note, you shouldn't check os.path.exists before deleting a file that may not exist; it'll cause race conditions if another process deletes the file at the same time. Instead, do this:

try:
    os.remove(inFileAsGzip)
except OSError, e:
    if e.errno != errno.ENOENT: raise

... which I usually wrap in a function like rm_f.

Finally, if you explicitly want to kill a subprocess, there's metadataPipes.kill--just closing its pipes won't do that--but that doesn't help explain the error. Also, again, if you're just reading gzip files you're much better off with the gzip module than a subprocess. http://docs.python.org/library/gzip.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文