“向后断言必须是固定长度”的技术原因是什么?在正则表达式中?

发布于 2024-09-25 02:40:16 字数 210 浏览 8 评论 0原文

例如,下面的正则表达式将导致失败报告lookbehindassertion is not fixed length

#(?<!(?:(?:src)|(?:href))=["\']?)((?:https?|ftp)://[^\s\'"<>()]+)#S

lookahead不存在这种限制。

For example,the regex below will cause failure reporting lookbehind assertion is not fixed length:

#(?<!(?:(?:src)|(?:href))=["\']?)((?:https?|ftp)://[^\s\'"<>()]+)#S

Such kind of restriction doesn't exist for lookahead.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

往昔成烟 2024-10-02 02:40:16

Lookahead 和 Lookbehind 并不像它们的名字所暗示的那么相似。前瞻表达式的工作方式与独立正则表达式完全相同,只是它锚定在当前匹配位置并且不消耗其匹配的内容。

Lookbehind 是一个完全不同的故事。从当前匹配位置开始,它一次一个字符地向后遍历文本,尝试匹配每个位置的表达式。如果无法匹配,则回溯必须一直走到文本的开头(记住,一次一个字符),然后才会放弃。将其与前瞻表达式进行比较,后者仅应用一次。

当然,这过于简单化了,并不是所有的口味都这样,但你明白了。后视的应用方式与先行的应用方式根本不同(而且效率低得多)。只有限制回溯必须向后看多远才有意义。

Lookahead and lookbehind aren't nearly as similar as their names imply. The lookahead expression works exactly the same as it would if it were a standalone regex, except it's anchored at the current match position and it doesn't consume what it matches.

Lookbehind is a whole different story. Starting at the current match position, it steps backward through the text one character at a time, attempting to match its expression at each position. In cases where no match is possible, the lookbehind has to go all the way to the beginning of the text (one character at a time, remember) before it gives up. Compare that to the lookahead expression, which gets applied exactly once.

This is a gross oversimplification, of course, and not all flavors work that way, but you get the idea. The way lookbehinds are applied is fundamentally different from (and much, much less efficient than) the way lookaheads are applied. It only makes sense to put a limit on how far back the lookbehind has to look.

无声静候 2024-10-02 02:40:16

PCRE 不支持浮动lookbehind,因为它可能会导致严重的性能问题。这是因为缺乏从右到左的匹配能力:PCRE只能从固定的左侧开始分支,但可变长度lookbehind的左侧不能固定。

一般来说,如果可能的话,尝试将后视部分分支为固定长度的模式。例如,不要使用:

(?<=(src|href)=")etc.

(1),而是使用:

(?:(?<=src=")|(?<=href="))etc.

(2) 或使用 \K

(src|href)="\Ketc.

请注意 \K 不是真正的后向查找,因为它总是在上一场比赛结束时开始搜索(没有可能回退到上一场比赛)。

(3) 在某些复杂的仅向后查找的情况下,您可以使用反向字符串中的“反向”先行表达式进行搜索。不太优雅但有效:

.cte(?="=(ferh|crs))

PCRE doesn't support floating lookbehind because it can cause major performance problems. This is because of the lack of right-to-left matching capability: PCRE can start a branch only from a fixed left, but left of a variable-length lookbehind can not be fixed.

Generally, try to branch your lookbehind part to fixed length patterns if possible. For example instead of:

(?<=(src|href)=")etc.

(1) use this:

(?:(?<=src=")|(?<=href="))etc.

(2) Or with \K:

(src|href)="\Ketc.

Note that \K is not a real lookbehind, because it always starts search at the end of previous match (no potential backstep into the previous match).

(3) In some complex lookbehind-only cases you can search with an "inverted" lookahead expression in a reversed string. Not too elegant but it works:

.cte(?="=(ferh|crs))
归途 2024-10-02 02:40:16

首先,并非所有正则表达式库(如 .NET)都是如此。

对于 PCRE,原因似乎是:

lookbehind的实现
对于每个替代方案,断言是
暂时移动当前
按固定宽度向后定位并
然后尝试匹配。

(至少,根据 http://www.autoitscript.com/autoit3/pcrepattern.html)。

First of all, this isn't true for all regular expression libraries (like .NET).

For PCRE, the reason appears to be:

The implementation of lookbehind
assertions is, for each alternative,
to temporarily move the current
position back by the fixed width and
then try to match.

(at least, according to http://www.autoitscript.com/autoit3/pcrepattern.html).

岁吢 2024-10-02 02:40:16

我遇到了同样的问题并使用 (?: subexpression) 修复了它

定义非捕获组。例如 Write(?:Line)? "WriteLine" 中
“Console.WriteLine()”在“Console.Write(value)”中“写入”

我必须更改下面的正则表达式,该正则表达式应该在 中的某些内容之前捕获给我后行断言的字符串的开头不是固定长度

(?<=,|^)

有了这个,

(?:(?<=,)|^)

I had the same issue and fixed it by using (?: subexpression)

Defines a noncapturing group. such as Write(?:Line)? "WriteLine" in
"Console.WriteLine()" "Write" in "Console.Write(value)"

I had to change the Regex below which is suppose to catch before , or something in the start of string which was giving me lookbehind assertion is not fixed length.

(?<=,|^)

with this,

(?:(?<=,)|^)
本宫微胖 2024-10-02 02:40:16
grep -P '(?<=((three)|(one)) )two' <<< "one two three three two one"
grep: lookbehind assertion is not fixed length

grep -P '((?<=(three) )|(?<=(one) ))two' <<< "one two three three two one"
one two three three two one

为了处理效率PCRE不支持从右到左匹配或递归。当进行后向 PCRE 搜索任何先前匹配字符串的末尾时 - 实现可变大小匹配将需要递归并降低效率。请参阅:查看断言背后

grep -P '(?<=((three)|(one)) )two' <<< "one two three three two one"
grep: lookbehind assertion is not fixed length

grep -P '((?<=(three) )|(?<=(one) ))two' <<< "one two three three two one"
one two three three two one

For processing efficiency PCRE does not support right-to-left matching or recursion. When doing a lookbehind PCRE searches the end of any previous matching string - implementing variable size matches would require recursion and reduce efficiency. See: Look Behind Assertions

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文