正则表达式负向前瞻

发布于 2024-08-11 06:20:07 字数 1285 浏览 3 评论 0原文

在我的主目录中,有一个包含 Drupal 平台的文件夹 drupal-6.14。

在此目录中,我使用以下命令:

find drupal-6.14 -type f -iname '*' | grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*' | xargs tar -czf drupal-6.14.tar.gz

此命令的作用是对文件夹 drupal-6.14 进行 gzip 压缩,排除 drupal-6.14/sites/ 的所有子文件夹站点除外/all 和sites/default,它包括在内。

我的问题是关于正则表达式:

grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*'

表达式可以排除我想要排除的所有文件夹,但我不太明白为什么。

使用正则表达式是一项常见的任务

匹配所有字符串,但包含子模式 x 的字符串除外。或者换句话说,否定子模式。

我(认为)我理解解决这些问题的一般策略是使用消极前瞻,但我从未令人满意地理解积极和消极前瞻(向前/向后)如何工作。

多年来,我读过很多关于它们的网站。 PHP 和 Python 正则表达式手册,其他页面,例如 http://www.regular-expressions.info/lookaround .html 等等,但我从来没有真正对它们有深入的了解。

有人可以解释一下这是如何工作的,并且也许提供一些可以做类似事情的类似示例吗?

-- 更新一:

关于 Andomar 的回应:双重否定前瞻能否更简洁地表达为单个肯定前瞻语句:

即 Is:

'drupal-6.14/(?!sites(?!/all|/default)).*'

相当于:

'drupal-6.14/(?=sites(?:/all|/default)).*'

???

-- 更新二:

根据 @andomar 和 @alan moore - 你不能将双负前瞻互换为正前瞻。

In my home directory I have a folder drupal-6.14 that contains the Drupal platform.

From this directory I use the following command:

find drupal-6.14 -type f -iname '*' | grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*' | xargs tar -czf drupal-6.14.tar.gz

What this command does is gzips the folder drupal-6.14, excluding all subfolders of drupal-6.14/sites/ except sites/all and sites/default, which it includes.

My question is on the regular expression:

grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*'

The expression works to exclude all the folders I want excluded, but I don't quite understand why.

It is a common task using regular expressions to

Match all strings, except those that don't contain subpattern x. Or in other words, negating a subpattern.

I (think) I understand that the general strategy to solve these problems is the use of negative lookaheads, but I've never understood to a satisfactory level how positive and negative look(ahead/behind)s work.

Over the years, I've read many websites on them. The PHP and Python regex manuals, other pages like http://www.regular-expressions.info/lookaround.html and so forth, but I've never really had a solid understanding of them.

Could someone explain, how this is working, and perhaps provide some similar examples that would do similar things?

-- Update One:

Regarding Andomar's response: can a double negative lookahead be more succinctly expressed as a single positive lookahead statement:

i.e Is:

'drupal-6.14/(?!sites(?!/all|/default)).*'

equivalent to:

'drupal-6.14/(?=sites(?:/all|/default)).*'

???

-- Update Two:

As per @andomar and @alan moore - you can't interchange double negative lookahead for positive lookahead.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

心如狂蝶 2024-08-18 06:20:07

负向前瞻表示,在此位置,以下正则表达式不得匹配。

让我们举一个简化的例子:

a(?!b(?!c))

a      Match: (?!b) succeeds
ac     Match: (?!b) succeeds
ab     No match: (?!b(?!c)) fails
abe    No match: (?!b(?!c)) fails
abc    Match: (?!b(?!c)) succeeds

最后一个例子是一个双重否定:它允许b后面跟着c。嵌套的负向先行变为正向先行:应存在 c

在每个示例中,仅匹配 a。前瞻只是一个条件,不会添加到匹配的文本中。

A negative lookahead says, at this position, the following regex must not match.

Let's take a simplified example:

a(?!b(?!c))

a      Match: (?!b) succeeds
ac     Match: (?!b) succeeds
ab     No match: (?!b(?!c)) fails
abe    No match: (?!b(?!c)) fails
abc    Match: (?!b(?!c)) succeeds

The last example is a double negation: it allows b followed by c. The nested negative lookahead becomes a positive lookahead: the c should be present.

In each example, only the a is matched. The lookahead is only a condition, and does not add to the matched text.

ゝ偶尔ゞ 2024-08-18 06:20:07

环视可以嵌套。

因此,此正则表达式匹配“drupal-6.14/”(not),后跟“sites”(not),后跟“/all”或“/default”。

令人困惑?使用不同的单词,我们可以说它匹配“drupal-6.14/”,即 not 后跟“sites”unless,再后跟“/all”或“/默认”

Lookarounds can be nested.

So this regex matches "drupal-6.14/" that is not followed by "sites" that is not followed by "/all" or "/default".

Confusing? Using different words, we can say it matches "drupal-6.14/" that is not followed by "sites" unless that is further followed by "/all" or "/default"

走走停停 2024-08-18 06:20:07

如果您像这样修改正则表达式:

drupal-6.14/(?=sites(?!/all|/default)).*
             ^^

...那么它将匹配包含 drupal-6.14/ 后跟 sites 后跟除<之外的任何内容的所有输入/em> /all/default。例如:

drupal-6.14/sites/foo
drupal-6.14/sites/bar
drupal-6.14/sitesfoo42
drupal-6.14/sitesall

?= 更改为 ?! 以匹配原始正则表达式只会否定这些匹配:

drupal-6.14/(?!sites(?!/all|/default)).*
             ^^

因此,这仅意味着 drupal-6.14/现在不能后跟sites,后跟/all/default之外的任何内容 >。所以现在,这些输入将满足正则表达式:

drupal-6.14/sites/all
drupal-6.14/sites/default
drupal-6.14/sites/all42

但是,从其他一些答案(可能还有您的问题)中可能不明显的是您的正则表达式也将允许其他> 输入,其中 drupal-6.14/ 后面也可以是 sites 以外的任何内容。例如:

drupal-6.14/foo
drupal-6.14/xsites

结论:因此,您的正则表达式基本上表示包含 drupal-6.14所有子目录除了这些子目录名称以 alldefault 以外的任何内容开头的网站

If you revise your regular expression like this:

drupal-6.14/(?=sites(?!/all|/default)).*
             ^^

...then it will match all inputs that contain drupal-6.14/ followed by sites followed by anything other than /all or /default. For example:

drupal-6.14/sites/foo
drupal-6.14/sites/bar
drupal-6.14/sitesfoo42
drupal-6.14/sitesall

Changing ?= to ?! to match your original regex simply negates those matches:

drupal-6.14/(?!sites(?!/all|/default)).*
             ^^

So, this simply means that drupal-6.14/ now cannot be followed by sites followed by anything other than /all or /default. So now, these inputs will satisfy the regex:

drupal-6.14/sites/all
drupal-6.14/sites/default
drupal-6.14/sites/all42

But, what may not be obvious from some of the other answers (and possibly your question) is that your regex will also permit other inputs where drupal-6.14/ is followed by anything other than sites as well. For example:

drupal-6.14/foo
drupal-6.14/xsites

Conclusion: So, your regex basically says to include all subdirectories of drupal-6.14 except those subdirectories of sites whose name begins with anything other than all or default.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文