当前位置：文江博客话题详情

使用非贪婪限定符还是前瞻更好？

发布于 2024-09-05 01:40:29 字数 349 浏览 14 评论 0原文

我可能有一大块文本来搜索 [[...]] 的实例，其中 ... 可以是任何内容，包括其他括号（尽管它们不能嵌套；[[ 结束匹配后 ]] 的第一个实例）。

我可以想到两种方法来匹配此文本：

使用非贪婪限定符： /\[\[.+?\]\]/
使用前瞻： /\[\[ (?:(?!\]\]).)+\]\]/

从性能的角度来看，一个选择本质上比另一个选择更好（我想说第一个可能更具可读性）？我记得读到最好不要使用非贪婪限定符，但我现在找不到来源。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最美不过初阳 2024-09-12 01:40:29

在这种情况下最好使用非贪婪量词。

以字符串为例 "[[a]b]]"

非贪婪量词

       \[\[.+?\]\]
Atom # 1 2 3  4 5

Atom #1 \[ 匹配
Atom #2 \[ 匹配
Atom #3 .+? 匹配 "a"
Atom #4 \] 匹配
Atom #5 \]失败，返回 #3，但保持字符串位置
Atom #3 .+? 与 "]" 匹配
Atom #4 \] 失败，返回到 #3，但保留字符串位置
Atom #3 .+? 匹配 "b"
Atom #4 \] 匹配
Atom #5 \] 匹配
成功

前瞻：

       \[\[(?:(?!\]\]).)+\]\]
Atom # 1 2 3  4       5  6 7

Atom #1 \[ 匹配
Atom #2 \[ 匹配
Atom #4 (?!\] \]) 在 "a" 处立即成功（即不匹配），继续
Atom #5 . 与 "a" 匹配code>，在 #3
Atom #4 (?!\]\]) 处重复，在 "]"
Atom #4 (?!\] 处实现部分匹配\]) 在 "b" 处成功（即不匹配），继续
Atom #5 。 与 "]" 匹配>，在 #3
Atom #4 (?!\]\]) 处重复，立即在 "b" 处成功（即不匹配），继续
Atom #5 < code>. 匹配 "b"，在 #3
Atom 处重复 #4 (?!\]\]) 在 " 处实现部分匹配]"
Atom #4 (?!\]\]) 在 "]" 处实现完全匹配，因此：#4 失败，退出 #3
Atom # 6 \] 匹配
Atom #7 \] 匹配
success

因此，看起来非贪婪量词要做的工作较少。

免责声明：这是一个人为的示例，实际性能可能会有所不同，具体取决于输入、实际表达式和正则表达式引擎的实现。我只有 98% 确信我在这里概述的内容是实际发生的情况，因此我愿意接受更正。另外，与所有性能技巧一样，不要只看表面价值，如果您想确定的话，请进行自己的基准比较。

It is better to use a non-greedy quantifier in this case.

Take this example string "[[a]b]]"

Non-greedy quantifier

       \[\[.+?\]\]
Atom # 1 2 3  4 5

Atom #1 \[ matches
Atom #2 \[ matches
Atom #3 .+? matches the "a"
Atom #4 \] matches
Atom #5 \] fails, back to #3 but keep string position
Atom #3 .+? matches the "]"
Atom #4 \] fails, back to #3 but keep string position
Atom #3 .+? matches the "b"
Atom #4 \] matches
Atom #5 \] matches
success

Look-ahead:

       \[\[(?:(?!\]\]).)+\]\]
Atom # 1 2 3  4       5  6 7

Atom #1 \[ matches
Atom #2 \[ matches
Atom #4 (?!\]\]) succeeds (i.e. non-match) immediately at "a", go on
Atom #5 . matches the "a", repeat at #3
Atom #4 (?!\]\]) achieves partial match at "]"
Atom #4 (?!\]\]) succeeds (i.e. non-match) at "b", go on
Atom #5 . matches the "]", repeat at #3
Atom #4 (?!\]\]) succeeds (i.e. non-match) immediately at "b", go on
Atom #5 . matches the "b", repeat at #3
Atom #4 (?!\]\]) achieves partial match at "]"
Atom #4 (?!\]\]) achieves full match at "]", ergo: #4 fails, exit #3
Atom #6 \] matches
Atom #7 \] matches
success

So it looks like the non-greedy quantifier has less work to do.

Disclaimer: This is an artificial example and real-life performance may differ, depending on the input, the actual expression and the implementation of the regex engine. I'm only 98% sure that what I outlined here is what is actually happening, so I'm open for corrections. Also, as with all performance tips, don't take this at face value, do your own benchmark comparisons if you want to know for sure.

回复收藏 0 原文

未蓝澄海的烟 2024-09-12 01:40:29

另一种变体： /\[\[((?:\]?[^]])+)]]/

它既不使用非贪婪量词，也不使用前瞻。它允许在任何非 ] 之前出现单个 ]。如果顺序有两个]，则内部重复将停止，并且比赛将结束。

此模式最适合与 FSA 编译正则表达式引擎一起使用。在回溯引擎上，它可能会比非贪婪变体慢。

回复收藏 0 原文

打小就很酷 2024-09-12 01:40:29

您使用哪种正则表达式？如果它支持所有格量词，那么还有一种更好的选择：

\[\[(?:[^\]]++|\](?!\]))*+\]\]

[^\]]++ 吞噬除 ] 之外的任何字符，并且不费心保存状态使回溯成为可能的信息。如果它确实看到 ]，它会执行向前查找以查看是否还有另一个。将整个事物包装在另一个所有格量词中意味着它只会在看到 ] 时进行前瞻，并且仅回溯一次：当它找到结束的 ]] 时。

Java、JGSoft、PCRE (PHP)、Oniguruma (Ruby 1.9) 和 Perl 5.12 风格支持所有格量词。所有这些风格还支持原子组，可用于实现相同的效果：

\[\[(?>(?:(?>[^\]]+)|\](?!\]))*)\]\]

.NET 风格支持原子组，但不支持所有格量词。

Which regex flavor are you using? If it's one that supports possessive quantifiers, there's a much better alternative:

\[\[(?:[^\]]++|\](?!\]))*+\]\]

[^\]]++ gobbles up any characters other than ] and doesn't bother saving the state information that would make backtracking possible. If it does see a ], it performs a lookahead to see if there's another. Wrapping the whole thing in another possessive quantifier means it only does a lookahead whenever it sees a ], and it only backtracks once: when it finds the closing ]].

Possessive quantifiers are supported by the Java, JGSoft, PCRE (PHP), Oniguruma (Ruby 1.9), and Perl 5.12 flavors. All those flavors also support atomic groups, which can be used to achieve the same effect: