正则表达式负向前瞻
在我的主目录中,有一个包含 Drupal 平台的文件夹 drupal-6.14。
在此目录中,我使用以下命令:
find drupal-6.14 -type f -iname '*' | grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*' | xargs tar -czf drupal-6.14.tar.gz
此命令的作用是对文件夹 drupal-6.14 进行 gzip 压缩,排除 drupal-6.14/sites/ 的所有子文件夹站点除外/all 和sites/default,它包括在内。
我的问题是关于正则表达式:
grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*'
表达式可以排除我想要排除的所有文件夹,但我不太明白为什么。
使用正则表达式是一项常见的任务
匹配所有字符串,但不包含子模式 x 的字符串除外。或者换句话说,否定子模式。
我(认为)我理解解决这些问题的一般策略是使用消极前瞻,但我从未令人满意地理解积极和消极前瞻(向前/向后)如何工作。
多年来,我读过很多关于它们的网站。 PHP 和 Python 正则表达式手册,其他页面,例如 http://www.regular-expressions.info/lookaround .html 等等,但我从来没有真正对它们有深入的了解。
有人可以解释一下这是如何工作的,并且也许提供一些可以做类似事情的类似示例吗?
-- 更新一:
关于 Andomar 的回应:双重否定前瞻能否更简洁地表达为单个肯定前瞻语句:
即 Is:
'drupal-6.14/(?!sites(?!/all|/default)).*'
相当于:
'drupal-6.14/(?=sites(?:/all|/default)).*'
???
-- 更新二:
根据 @andomar 和 @alan moore - 你不能将双负前瞻互换为正前瞻。
In my home directory I have a folder drupal-6.14 that contains the Drupal platform.
From this directory I use the following command:
find drupal-6.14 -type f -iname '*' | grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*' | xargs tar -czf drupal-6.14.tar.gz
What this command does is gzips the folder drupal-6.14, excluding all subfolders of drupal-6.14/sites/ except sites/all and sites/default, which it includes.
My question is on the regular expression:
grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*'
The expression works to exclude all the folders I want excluded, but I don't quite understand why.
It is a common task using regular expressions to
Match all strings, except those that don't contain subpattern x. Or in other words, negating a subpattern.
I (think) I understand that the general strategy to solve these problems is the use of negative lookaheads, but I've never understood to a satisfactory level how positive and negative look(ahead/behind)s work.
Over the years, I've read many websites on them. The PHP and Python regex manuals, other pages like http://www.regular-expressions.info/lookaround.html and so forth, but I've never really had a solid understanding of them.
Could someone explain, how this is working, and perhaps provide some similar examples that would do similar things?
-- Update One:
Regarding Andomar's response: can a double negative lookahead be more succinctly expressed as a single positive lookahead statement:
i.e Is:
'drupal-6.14/(?!sites(?!/all|/default)).*'
equivalent to:
'drupal-6.14/(?=sites(?:/all|/default)).*'
???
-- Update Two:
As per @andomar and @alan moore - you can't interchange double negative lookahead for positive lookahead.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
负向前瞻表示,在此位置,以下正则表达式不得匹配。
让我们举一个简化的例子:
最后一个例子是一个双重否定:它允许
b
后面跟着c
。嵌套的负向先行变为正向先行:应存在c
。在每个示例中,仅匹配
a
。前瞻只是一个条件,不会添加到匹配的文本中。A negative lookahead says, at this position, the following regex must not match.
Let's take a simplified example:
The last example is a double negation: it allows
b
followed byc
. The nested negative lookahead becomes a positive lookahead: thec
should be present.In each example, only the
a
is matched. The lookahead is only a condition, and does not add to the matched text.环视可以嵌套。
因此,此正则表达式匹配“drupal-6.14/”(not),后跟“sites”(not),后跟“/all”或“/default”。
令人困惑?使用不同的单词,我们可以说它匹配“drupal-6.14/”,即 not 后跟“sites”unless,再后跟“/all”或“/默认”
Lookarounds can be nested.
So this regex matches "drupal-6.14/" that is not followed by "sites" that is not followed by "/all" or "/default".
Confusing? Using different words, we can say it matches "drupal-6.14/" that is not followed by "sites" unless that is further followed by "/all" or "/default"
如果您像这样修改正则表达式:
...那么它将匹配包含
drupal-6.14/
后跟sites
后跟除<之外的任何内容的所有输入/em>/all
或/default
。例如:将
?=
更改为?!
以匹配原始正则表达式只会否定这些匹配:因此,这仅意味着
drupal-6.14/
现在不能后跟sites
,后跟除/all
或/default
之外的任何内容 >。所以现在,这些输入将满足正则表达式:但是,从其他一些答案(可能还有您的问题)中可能不明显的是您的正则表达式也将允许其他> 输入,其中
drupal-6.14/
后面也可以是sites
以外的任何内容。例如:结论:因此,您的正则表达式基本上表示包含
drupal-6.14
的所有子目录除了这些子目录名称以all
或default
以外的任何内容开头的网站
。If you revise your regular expression like this:
...then it will match all inputs that contain
drupal-6.14/
followed bysites
followed by anything other than/all
or/default
. For example:Changing
?=
to?!
to match your original regex simply negates those matches:So, this simply means that
drupal-6.14/
now cannot be followed bysites
followed by anything other than/all
or/default
. So now, these inputs will satisfy the regex:But, what may not be obvious from some of the other answers (and possibly your question) is that your regex will also permit other inputs where
drupal-6.14/
is followed by anything other thansites
as well. For example:Conclusion: So, your regex basically says to include all subdirectories of
drupal-6.14
except those subdirectories ofsites
whose name begins with anything other thanall
ordefault
.