使用 subprocess 模块通过 | 进行 grep特点
想象一下 file.txt
包含以下内容:
line one
line two
line three
然后,对 subprocess.check_output
的这些调用失败(python 2.7.5 表示 grep 失败,退出代码为 1,在 python 3.8 中。 5 它挂起并需要键盘中断来停止程序):
# first approach
command = 'grep "one\|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)
# second approach
command = 'grep -E "one|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)
但是此调用成功(在两个版本上)并给出了预期的输出:
#third approach
command = 'grep -e one -e three ./file.txt'
results = subprocess.check_output(command.split())
print(results)
为什么会出现这种情况?我对为什么方法一和方法二不起作用的唯一猜测是 subprocess
模块和 |
字符工作方式之间存在一些复杂性,但老实说,我不知道为什么会这样会导致调用失败;在第一种方法中,字符被转义,而在第二种方法中,我们将一个标志传递给 grep ,表示我们不必转义该字符。此外,如果您只是像平常一样在命令行中输入方法 1 和 2,则方法 1 和 2 可以按预期工作。 subprocess
模块是否会将字符解释为管道而不是正则表达式 OR?
Imagine that file.txt
contains the following:
line one
line two
line three
Then, these calls to subprocess.check_output
fail (python 2.7.5 says that grep fails with exit code 1, in python 3.8.5 it hangs & requires a keyboard interrupt to stop the program):
# first approach
command = 'grep "one\|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)
# second approach
command = 'grep -E "one|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)
but this call succeeds (on both versions) and gives the expected output:
#third approach
command = 'grep -e one -e three ./file.txt'
results = subprocess.check_output(command.split())
print(results)
Why is this the case? My only guess as to why approaches one and two don't work is some intricacy between how the subprocess
module and the |
character work, but I honestly have no idea why this would cause the call to fail; in the first approach, the character is escaped, and in the second approach, we have a flag being passed to grep saying that we shouldn't have to escape the character. Additionally, approaches 1 and 2 work as expected if you just enter them in on the command line as normal. Could it be that the subprocess
module is interpreting the character as a pipe instead of a regex OR?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
command.split()
的结果包含不应再存在的引号。这就是 Python 提供shlex.split
的原因,但也不难理解如何手动拆分命令,尽管显然您需要了解引号在 shell 中的作用,以及您基本上需要如何拆分命令。当没有外壳时将其去除。引号告诉 shell 不要对值执行空格标记化和/或通配符扩展,但是当没有 shell 时,您应该简单地提供一个字符串,而不是 shell 允许甚至要求您使用带引号的 字符串。
The result of
command.split()
contains quotes which should no longer be there. That's why Python providesshlex.split
, but it's also not hard to understand how to split the command manually, though obviously you need to understand the role of the quotes in the shell, and how basically you need to remove them when there is no shell.Quotes tell the shell to not perform whitespace tokenization and/or wildcard expansion on a value, but when there is no shell, you should simply provide a string instead where the shell allowed or even required you to use a quoted string.