正则表达式负向前瞻

发布于 2024-08-11 06:20:07 字数 1285 浏览 11 评论 0原文

在我的主目录中，有一个包含 Drupal 平台的文件夹 drupal-6.14。

在此目录中，我使用以下命令：

find drupal-6.14 -type f -iname '*' | grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*' | xargs tar -czf drupal-6.14.tar.gz

此命令的作用是对文件夹 drupal-6.14 进行 gzip 压缩，排除 drupal-6.14/sites/ 的所有子文件夹站点除外/all 和sites/default，它包括在内。

我的问题是关于正则表达式：

grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*'

表达式可以排除我想要排除的所有文件夹，但我不太明白为什么。

使用正则表达式是一项常见的任务

匹配所有字符串，但不包含子模式 x 的字符串除外。或者换句话说，否定子模式。

我（认为）我理解解决这些问题的一般策略是使用消极前瞻，但我从未令人满意地理解积极和消极前瞻（向前/向后）如何工作。

多年来，我读过很多关于它们的网站。 PHP 和 Python 正则表达式手册，其他页面，例如 http://www.regular-expressions.info/lookaround .html 等等，但我从来没有真正对它们有深入的了解。

有人可以解释一下这是如何工作的，并且也许提供一些可以做类似事情的类似示例吗？

-- 更新一：

关于 Andomar 的回应：双重否定前瞻能否更简洁地表达为单个肯定前瞻语句：

即 Is：

'drupal-6.14/(?!sites(?!/all|/default)).*'

相当于：

'drupal-6.14/(?=sites(?:/all|/default)).*'

？？？

-- 更新二：

根据 @andomar 和 @alan moore - 你不能将双负前瞻互换为正前瞻。

原文

In my home directory I have a folder drupal-6.14 that contains the Drupal platform.

From this directory I use the following command:

find drupal-6.14 -type f -iname '*' | grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*' | xargs tar -czf drupal-6.14.tar.gz

What this command does is gzips the folder drupal-6.14, excluding all subfolders of drupal-6.14/sites/ except sites/all and sites/default, which it includes.

My question is on the regular expression:

grep -P 'drupal-6.14/(?!sites(?!/all|/default)).*'

The expression works to exclude all the folders I want excluded, but I don't quite understand why.

It is a common task using regular expressions to

Match all strings, except those that don't contain subpattern x. Or in other words, negating a subpattern.

I (think) I understand that the general strategy to solve these problems is the use of negative lookaheads, but I've never understood to a satisfactory level how positive and negative look(ahead/behind)s work.

Over the years, I've read many websites on them. The PHP and Python regex manuals, other pages like http://www.regular-expressions.info/lookaround.html and so forth, but I've never really had a solid understanding of them.

Could someone explain, how this is working, and perhaps provide some similar examples that would do similar things?

-- Update One:

Regarding Andomar's response: can a double negative lookahead be more succinctly expressed as a single positive lookahead statement:

i.e Is:

'drupal-6.14/(?!sites(?!/all|/default)).*'

equivalent to:

'drupal-6.14/(?=sites(?:/all|/default)).*'

???

-- Update Two:

As per @andomar and @alan moore - you can't interchange double negative lookahead for positive lookahead.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心如狂蝶 2024-08-18 06:20:07

负向前瞻表示，在此位置，以下正则表达式不得匹配。

让我们举一个简化的例子：

a(?!b(?!c))

a      Match: (?!b) succeeds
ac     Match: (?!b) succeeds
ab     No match: (?!b(?!c)) fails
abe    No match: (?!b(?!c)) fails
abc    Match: (?!b(?!c)) succeeds

最后一个例子是一个双重否定：它允许b后面跟着c。嵌套的负向先行变为正向先行：应存在 c。

在每个示例中，仅匹配 a。前瞻只是一个条件，不会添加到匹配的文本中。

A negative lookahead says, at this position, the following regex must not match.

Let's take a simplified example:

a(?!b(?!c))

a      Match: (?!b) succeeds
ac     Match: (?!b) succeeds
ab     No match: (?!b(?!c)) fails
abe    No match: (?!b(?!c)) fails
abc    Match: (?!b(?!c)) succeeds

The last example is a double negation: it allows b followed by c. The nested negative lookahead becomes a positive lookahead: the c should be present.

In each example, only the a is matched. The lookahead is only a condition, and does not add to the matched text.

回复收藏 0 原文

ゝ偶尔ゞ 2024-08-18 06:20:07

环视可以嵌套。

因此，此正则表达式匹配“drupal-6.14/”（not），后跟“sites”（not），后跟“/all”或“/default”。

令人困惑？使用不同的单词，我们可以说它匹配“drupal-6.14/”，即 not 后跟“sites”unless，再后跟“/all”或“/默认”

回复收藏 0 原文

走走停停 2024-08-18 06:20:07

如果您像这样修改正则表达式：

drupal-6.14/(?=sites(?!/all|/default)).*
             ^^

...那么它将匹配包含 drupal-6.14/ 后跟 sites 后跟除<之外的任何内容的所有输入/em> /all 或 /default。例如：

drupal-6.14/sites/foo
drupal-6.14/sites/bar
drupal-6.14/sitesfoo42
drupal-6.14/sitesall

将 ?= 更改为 ?! 以匹配原始正则表达式只会否定这些匹配：

drupal-6.14/(?!sites(?!/all|/default)).*
             ^^

因此，这仅意味着 drupal-6.14/现在不能后跟sites，后跟除/all或/default之外的任何内容 >。所以现在，这些输入将满足正则表达式：

drupal-6.14/sites/all
drupal-6.14/sites/default
drupal-6.14/sites/all42

但是，从其他一些答案（可能还有您的问题）中可能不明显的是您的正则表达式也将允许其他> 输入，其中 drupal-6.14/ 后面也可以是 sites 以外的任何内容。例如：

drupal-6.14/foo
drupal-6.14/xsites

结论：因此，您的正则表达式基本上表示包含 drupal-6.14 的所有子目录除了这些子目录名称以 all 或 default 以外的任何内容开头的网站。

If you revise your regular expression like this:

drupal-6.14/(?=sites(?!/all|/default)).*
             ^^

...then it will match all inputs that contain drupal-6.14/ followed by sites followed by anything other than /all or /default. For example:

drupal-6.14/sites/foo
drupal-6.14/sites/bar
drupal-6.14/sitesfoo42
drupal-6.14/sitesall

Changing ?= to ?! to match your original regex simply negates those matches:

drupal-6.14/(?!sites(?!/all|/default)).*
             ^^

So, this simply means that drupal-6.14/ now cannot be followed by sites followed by anything other than /all or /default. So now, these inputs will satisfy the regex:

drupal-6.14/sites/all
drupal-6.14/sites/default
drupal-6.14/sites/all42

But, what may not be obvious from some of the other answers (and possibly your question) is that your regex will also permit other inputs where drupal-6.14/ is followed by anything other than sites as well. For example:

drupal-6.14/foo
drupal-6.14/xsites

Conclusion: So, your regex basically says to include all subdirectories of drupal-6.14 except those subdirectories of sites whose name begins with anything other than all or default.

回复收藏 0 原文

~没有更多了~