无法通过 AWK/Python/Bash 搜索以随机顺序包含三个 7 的名称
我需要找到按随机顺序包含三个数字 7 的名称。
我的尝试
我们需要找到不包含七个的名字
ls | grep [^7]
然后,我们可以从整个空间中删除这些匹配项
ls [remove] ls | grep [^7]
我的伪代码中的问题开始快速重复。
如何通过 AWK/Python/Bash 找到以随机顺序包含三个 7 的名称?
[编辑] 该名称可以包含任意数量的字母,并且包含三个 7 的单词。
I need to find names which contain three number 7 in the random order.
My attempt
We need to find first names which do not contain seven
ls | grep [^7]
Then, we could remove these matches from the whole space
ls [remove] ls | grep [^7]
The problem in my pseudo-code starts to repeat itself quickly.
How can you find the names which contain three 7s in the random order by AWK/Python/Bash?
[edit]
The name can contain any number of letters and it contains words of three 7s.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我不明白有关“随机顺序”的部分。 当重复的相同标记时,如何区分“顺序”? “a7b7”与“c7d7”在 7 的顺序上是否不同?
不管怎样,这应该可行:
它只是让 shell 解决问题,但也许我没有正确理解。
编辑:上面是错误的,它包括超过四个 7 的情况,这是不想要的。 假设这是 bash,并且启用了扩展通配符,那么这是有效的:
这读作“零个或多个非七的字符,后跟一个七,后跟零个或多个非七的字符”,依此类推。 重要的是要理解星号在这里是一个前缀运算符,对表达式
([^7])
进行操作,这意味着“除7之外的任何字符”。I don't understand the part about "random order". How do you differentiate between the "order" when it's the same token that repeats? Is "a7b7" different from "c7d7" in the order of the 7s?
Anyway, this ought to work:
It just let's the shell solve the problem, but maybe I didn't understand properly.
EDIT: The above is wrong, it includes cases with more than four 7s which is not wanted. Assuming this is bash, and extended globbing is enabled, this works:
This reads as "zero or more characters which are not sevens, followed by a seven, followed by zero or more characters that are not sevens", and so on. It's important to understand that the asterisk is a prefix operator here, operating on the expression
([^7])
which means "any character except 7".我猜您想查找恰好包含三个 7 的文件,但不再包含更多。 将 gnu grep 与扩展正则表达式开关 (
-E
) 结合使用:应该可以解决问题。
基本上匹配 3 次出现的“not 7 后跟 7”,然后是整个字符串中的一堆“not 7”(分别位于模式开头和结尾的 ^ 和 $)。
I'm guessing you want to find files that contain exactly three 7's, but no more. Using gnu grep with the extends regexp switch (
-E
):Should do the trick.
Basically that matches 3 occurrences of "not 7 followed by a 7", then a bunch of "not 7" across the whole string (the ^ and $ at the beginning and end of the pattern respectively).
像这样的东西:
Something like this:
Perl 解决方案:(
我碰巧有一个包含 4 位数字的目录。1777 和 2777 不存在。:-)
A Perl solution:
(I happen to have a directory with 4-digit numbers. 1777 and 2777 don't exist. :-)
或者,不要在单个 grep 中执行此操作,而是使用一个 grep 查找包含 3 个或更多 7 的文件,然后使用另一个 grep 过滤掉 4 个或更多 7。
您可以将一些工作移至较短的 shell glob 中,
但如果有大量文件与该模式匹配,则由于 glob 大小的内置限制,后者将无法工作。
'ls' 中的 '-f' 是为了防止 'ls' 对结果进行排序。 如果目录中有大量文件,那么排序时间可能会非常明显。
我认为,这个两步过滤过程比使用 [^7] 模式更容易理解。
另外,这是一个 Python 脚本的解决方案,因为您要求将其作为一个选项。
这将处理 shell 命令无法处理的一些情况,例如包含换行符的(邪恶)文件名。 尽管即使在这里,这种情况下的输出可能仍然是错误的,或者至少下游程序没有做好准备。
Or instead of doing it in a single grep, use one grep to find files with 3-or-more 7s and another to filter out 4-or-more 7s.
You could move some of the work into the shell glob with the shorter
though if there are a large number of files which match that pattern then the latter won't work because of built-in limits to the glob size.
The '-f' in the 'ls' is to prevent 'ls' from sorting the results. If there is a huge number of files in the directory then the sort time can be quite noticeable.
This two-step filter process is, I think, more understandable than using the [^7] patterns.
Also, here's the solution as a Python script, since you asked for that as an option.
This will handle a few cases that the shell commands won't, like (evil) filenames which contain a newline character. Though even here the output in that case would likely still be wrong, or at least unprepared for by downstream programs.