使用 subprocess 模块通过 | 进行 grep特点

发布于 2025-01-15 17:54:25 字数 954 浏览 3 评论 0原文

想象一下 file.txt 包含以下内容：

line one
line two
line three

然后，对 subprocess.check_output 的这些调用失败（python 2.7.5 表示 grep 失败，退出代码为 1，在 python 3.8 中。 5 它挂起并需要键盘中断来停止程序）：

# first approach
command = 'grep "one\|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)

# second approach
command = 'grep -E "one|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)

但是此调用成功（在两个版本上）并给出了预期的输出：

#third approach
command = 'grep -e one -e three ./file.txt'
results = subprocess.check_output(command.split())
print(results)

为什么会出现这种情况？我对为什么方法一和方法二不起作用的唯一猜测是 subprocess 模块和 | 字符工作方式之间存在一些复杂性，但老实说，我不知道为什么会这样会导致调用失败；在第一种方法中，字符被转义，而在第二种方法中，我们将一个标志传递给 grep ，表示我们不必转义该字符。此外，如果您只是像平常一样在命令行中输入方法 1 和 2，则方法 1 和 2 可以按预期工作。 subprocess 模块是否会将字符解释为管道而不是正则表达式 OR？

原文

Imagine that file.txt contains the following:

line one
line two
line three

Then, these calls to subprocess.check_output fail (python 2.7.5 says that grep fails with exit code 1, in python 3.8.5 it hangs & requires a keyboard interrupt to stop the program):

# first approach
command = 'grep "one\|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)

# second approach
command = 'grep -E "one|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)

but this call succeeds (on both versions) and gives the expected output:

#third approach
command = 'grep -e one -e three ./file.txt'
results = subprocess.check_output(command.split())
print(results)

Why is this the case? My only guess as to why approaches one and two don't work is some intricacy between how the subprocess module and the | character work, but I honestly have no idea why this would cause the call to fail; in the first approach, the character is escaped, and in the second approach, we have a flag being passed to grep saying that we shouldn't have to escape the character. Additionally, approaches 1 and 2 work as expected if you just enter them in on the command line as normal. Could it be that the subprocess module is interpreting the character as a pipe instead of a regex OR?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北陌 2025-01-22 17:54:25

command.split() 的结果包含不应再存在的引号。这就是 Python 提供 shlex.split 的原因，但也不难理解如何手动拆分命令，尽管显然您需要了解引号在 shell 中的作用，以及您基本上需要如何拆分命令。当没有外壳时将其去除。

command = 'grep "one\|three" ./file.txt'
results1 = subprocess.check_output(['grep', r'one\|three', './file.txt'])
results2 = subprocess.check_output(shlex.split(command))
results3 = subprocess.check_output(command, shell=True) # better avoid

引号告诉 shell 不要对值执行空格标记化和/或通配符扩展，但是当没有 shell 时，您应该简单地提供一个字符串，而不是 shell 允许甚至要求您使用带引号的字符串。

The result of command.split() contains quotes which should no longer be there. That's why Python provides shlex.split, but it's also not hard to understand how to split the command manually, though obviously you need to understand the role of the quotes in the shell, and how basically you need to remove them when there is no shell.

command = 'grep "one\|three" ./file.txt'
results1 = subprocess.check_output(['grep', r'one\|three', './file.txt'])
results2 = subprocess.check_output(shlex.split(command))
results3 = subprocess.check_output(command, shell=True) # better avoid

Quotes tell the shell to not perform whitespace tokenization and/or wildcard expansion on a value, but when there is no shell, you should simply provide a string instead where the shell allowed or even required you to use a quoted string.

回复收藏 0 原文

~没有更多了~