使用 glob 参数递归匹配文件名

发布于 2024-11-08 22:16:45 字数 737 浏览 6 评论 0原文

我一直在尝试使用 glob.glob递归地获取与命令行参数 (sys.argv[1]) 中的 glob 模式匹配的文件列表os.walk。问题是,bash(似乎还有许多其他 shell)自动将 glob 模式扩展为文件名。

标准 UNIX 程序(例如 grep -R)是如何做到这一点的呢?我意识到它们不是用 python 编写的,但如果这是在 shell 级别发生的,那应该不重要,对吧?有没有办法让脚本告诉 shell 不要自动扩展 glob 模式?看起来 set -f 将禁用通配符,但可以这么说,我不确定如何尽早运行它。

我见过 使用 Glob() 递归查找文件Python?,但这并不包括实际从命令行参数获取全局模式。

谢谢!

编辑:

类似 grep 的 perl 脚本 ack 接受 perl 正则表达式作为其参数之一。因此,ack .* 打印出每个文件的每一行。但 .* 应扩展到目录中的所有隐藏文件。我尝试阅读脚本,但我不懂 perl;它怎么能做到这一点?

I have been trying to get a list of files matching a glob pattern in a command line argument (sys.argv[1]) recursively using glob.glob and os.walk. The problem is, bash (and many other shells it seems) auto-expand glob patterns into filenames.

How do standard unix programs (e.g. grep -R) do this then? I realize they're not in python, but if this is happening at the shell level, that shouldn't matter, right? Is there a way for a script to tell the shell to not auto-expand glob patterns? It looks like set -f will disable globbing, but I'm not sure how to run this early enough, so to speak.

I've seen Use a Glob() to find files recursively in Python?, but that doesn't cover actually getting the glob patterns from command line arguments.

Thanks!

Edit:

The grep-like perl script ack accepts a perl regex as one of its arguments. Thus, ack .* prints out every line of every file. But .* should expand to all hidden files in a directory. I tried reading the script but I don't know perl; how can it do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

寄人书 2024-11-15 22:16:45

shell 在考虑调用该命令之前就执行了全局扩展。 grep 等程序不会采取任何措施来防止通配符:它们不能。您作为这些程序的调用者,必须告诉 shell 您要将 *? 等特殊字符传递给程序,并且不让 shell 解释他们。您可以通过将它们放在引号内来做到这一点:(

grep -E 'ba(na)* split' *.txt

在所有名为.txtba splitbana split 等code>) 在这种情况下,单引号或双引号都可以解决问题。在单引号之间,shell 不扩展任何内容。在双引号之间,$`\ 仍会被解释。您还可以通过在单个字符前面添加反斜杠来防止 shell 扩展。不仅需要保护通配符,还需要保护通配符。例如,上面的模式中的空格用引号引起来,因此它是 grep 参数的一部分,而不是参数分隔符。编写上面代码片段的替代方法包括:

grep -E "ba(na)* split" *.txt
grep -E ba\(na\)\*\ split *.txt

对于大多数 shell,如果参数包含通配符但模式与任何文件都不匹配,则模式将保持不变并传递给底层命令。 类似的命令

grep b[an]*a *.txt

因此,根据系统上存在的文件, 会产生不同的效果。如果当前目录不包含任何名称以 b 开头的文件,则该命令在名称与 匹配的文件中搜索模式 b[an]*a *.txt。如果当前目录包含名为 baclavabnmhello.txt 的文件,则该命令将扩展为 grep baclava bnm hello.txt< /code>,因此它会在 bnmhello.txt 这两个文件中搜索模式 baclava。不用说,在脚本中依赖它是一个坏主意;在命令行上,它有时可以节省输入,但有风险。

当您在不包含点文件的目录中运行 ack .* 时,shell 会运行 ack 。 ..。然后,ack 命令的行为是打印出以下所有文件中的所有非空行(模式 .:匹配任何一个字符) .. (当前目录的父目录)递归。与 ack '.*' 对比,它在当前目录及其子目录中搜索模式 .* (匹配任何内容)(由于 ack 的行为 当您不传递任何文件名参数时)。

The shell performs glob expansion before it even thinks of invoking the command. Programs such as grep don't do anything to prevent globbing: they can't. You, as the caller of these programs, must tell the shell that you want to pass the special characters such as * and ? to the program, and not let the shell interpret them. You do that by putting them inside quotes:

grep -E 'ba(na)* split' *.txt

(look for ba split, bana split, etc., in all files called <something>.txt) In this case, either single quotes or double quotes will do the trick. Between single quotes, the shell expands nothing. Between double quotes, $, ` and \ are still interpreted. You can also protect a single character from shell expansion by preceding it with a backslash. It's not only wildcard characters that need to be protected; for example, above, the space in the pattern is in quotes so it's part of the argument to grep and not an argument separator. Alternative ways to write the snippet above include

grep -E "ba(na)* split" *.txt
grep -E ba\(na\)\*\ split *.txt

With most shells, if an argument contains wildcards but the pattern doesn't match any file, the pattern is left unchanged and passed to the underlying command. So a command like

grep b[an]*a *.txt

has a different effect depending on what files are present on the system. If the current directory doesn't contain any file whose name begins with b, the command searches the pattern b[an]*a in the files whose name matches *.txt. If the current directory contains files named baclava, bnm and hello.txt, the command expands to grep baclava bnm hello.txt, so it searches the pattern baclava in the two files bnm and hello.txt. Needless to say, it's a bad idea to rely on this in scripts; on the command line it can occasionally save typing, but it's risky.

When you run ack .* in a directory containing no dot file, the shell runs ack . ... The behavior of the ack command is then to print out all non-empty lines (pattern .: matches any one character) in all files under .. (the parent of the current directory) recursively. Contrast with ack '.*', which searches the pattern .* (which matches anything) in the current directory and its subdirectories (due to the behavior of ack when you don't pass any filename argument).

扛刀软妹 2024-11-15 22:16:45

当谈到 grep 时,它只是接受文件名列表,并且本身不进行全局扩展。如果您确实需要将模式作为参数传递,则必须在命令行上用单引号将其引起来。但在执行此操作之前,请考虑让 shell 完成其设计的工作。

When it comes to grep, it simply accept a list of filenames, and doesn't do the glob expansion itself. If you really need to pass a pattern as an argument, it has to be quoted on the command line with single quotes. But before you do that, consider letting the shell do the job it was designed for.

最近可好 2024-11-15 22:16:45

是的,set -f,您走在正确的道路上。

听起来您要从 shell 调用您的 python 程序。

每当您使用 shell 发出命令时,它都会尝试扫描 cmd 行并处理通配符、命令替换和一大堆其他内容。

因此,在在命令行上运行程序之前,您必须关闭globing,

set -f
echo *
*

myprogram *.txt

将字符串“*.txt”传递给您的程序。然后您可以使用内部通配符来获取文件。

或者,您可以通过创建一个包装脚本来完成基本相同的操作

 #!/bin/bash
 set -f
 myProgram ${@}

,其中 ${@} 是您在从命令行、crontab 或通过 exec(.. .) 来自另一个进程。

我希望这有帮助。

Yes, set -f, you're on the right track.

It sounds like you are going to call your python program from a shell.

Any time you use a shell to issue a command, it tries scans the cmd-line and processes wild-cards, command-substitution and a whole bunch of other things.

So you have to turn off the the globing before you run the program on the command-line

set -f
echo *
*

myprogram *.txt

will pass the string '*.txt' to your program. Then you can use the internal globbing to get your files.

OR you can do essentially the same thing by creating a wrapper script

 #!/bin/bash
 set -f
 myProgram ${@}

where ${@} are the arguments you pass in when you startmyProgram` either from the command -line, crontab or via exec(...) from another process.

I hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文