“向后断言必须是固定长度”的技术原因是什么？在正则表达式中？

发布于 2024-09-25 02:40:16 字数 210 浏览 13 评论 0原文

例如，下面的正则表达式将导致失败报告lookbehindassertion is not fixed length：

#(?<!(?:(?:src)|(?:href))=["\']?)((?:https?|ftp)://[^\s\'"<>()]+)#S

lookahead不存在这种限制。

原文

For example,the regex below will cause failure reporting lookbehind assertion is not fixed length:

#(?<!(?:(?:src)|(?:href))=["\']?)((?:https?|ftp)://[^\s\'"<>()]+)#S

Such kind of restriction doesn't exist for lookahead.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

往昔成烟 2024-10-02 02:40:16

Lookahead 和 Lookbehind 并不像它们的名字所暗示的那么相似。前瞻表达式的工作方式与独立正则表达式完全相同，只是它锚定在当前匹配位置并且不消耗其匹配的内容。

Lookbehind 是一个完全不同的故事。从当前匹配位置开始，它一次一个字符地向后遍历文本，尝试匹配每个位置的表达式。如果无法匹配，则回溯必须一直走到文本的开头（记住，一次一个字符），然后才会放弃。将其与前瞻表达式进行比较，后者仅应用一次。

当然，这过于简单化了，并不是所有的口味都这样，但你明白了。后视的应用方式与先行的应用方式根本不同（而且效率低得多）。只有限制回溯必须向后看多远才有意义。

回复收藏 0 原文

无声静候 2024-10-02 02:40:16

PCRE 不支持浮动lookbehind，因为它可能会导致严重的性能问题。这是因为缺乏从右到左的匹配能力：PCRE只能从固定的左侧开始分支，但可变长度lookbehind的左侧不能固定。

一般来说，如果可能的话，尝试将后视部分分支为固定长度的模式。例如，不要使用：

(?<=(src|href)=")etc.

(1)，而是使用：

(?:(?<=src=")|(?<=href="))etc.

(2) 或使用 \K：

(src|href)="\Ketc.

请注意 \K 不是真正的后向查找，因为它总是在上一场比赛结束时开始搜索（没有可能回退到上一场比赛）。

(3) 在某些复杂的仅向后查找的情况下，您可以使用反向字符串中的“反向”先行表达式进行搜索。不太优雅但有效：

.cte(?="=(ferh|crs))

PCRE doesn't support floating lookbehind because it can cause major performance problems. This is because of the lack of right-to-left matching capability: PCRE can start a branch only from a fixed left, but left of a variable-length lookbehind can not be fixed.

Generally, try to branch your lookbehind part to fixed length patterns if possible. For example instead of:

(?<=(src|href)=")etc.

(1) use this:

(?:(?<=src=")|(?<=href="))etc.

(2) Or with \K:

(src|href)="\Ketc.

Note that \K is not a real lookbehind, because it always starts search at the end of previous match (no potential backstep into the previous match).

(3) In some complex lookbehind-only cases you can search with an "inverted" lookahead expression in a reversed string. Not too elegant but it works:

.cte(?="=(ferh|crs))

回复收藏 0 原文

归途 2024-10-02 02:40:16

首先，并非所有正则表达式库（如 .NET）都是如此。

对于 PCRE，原因似乎是：

lookbehind的实现
对于每个替代方案，断言是
暂时移动当前
按固定宽度向后定位并
然后尝试匹配。

（至少，根据 http://www.autoitscript.com/autoit3/pcrepattern.html）。

回复收藏 0 原文

岁吢 2024-10-02 02:40:16

我遇到了同样的问题并使用 (?: subexpression) 修复了它

定义非捕获组。例如 Write(?:Line)? "WriteLine" 中
“Console.WriteLine()”在“Console.Write(value)”中“写入”

我必须更改下面的正则表达式，该正则表达式应该在 、 或中的某些内容之前捕获给我后行断言的字符串的开头不是固定长度。

(?<=,|^)

有了这个，

(?:(?<=,)|^)

I had the same issue and fixed it by using (?: subexpression)

Defines a noncapturing group. such as Write(?:Line)? "WriteLine" in
"Console.WriteLine()" "Write" in "Console.Write(value)"

I had to change the Regex below which is suppose to catch before , or something in the start of string which was giving me lookbehind assertion is not fixed length.

(?<=,|^)

with this,

(?:(?<=,)|^)

回复收藏 0 原文

本宫微胖 2024-10-02 02:40:16

grep -P '(?<=((three)|(one)) )two' <<< "one two three three two one"
grep: lookbehind assertion is not fixed length

grep -P '((?<=(three) )|(?<=(one) ))two' <<< "one two three three two one"
one two three three two one

为了处理效率PCRE不支持从右到左匹配或递归。当进行后向 PCRE 搜索任何先前匹配字符串的末尾时 - 实现可变大小匹配将需要递归并降低效率。请参阅：查看断言背后

grep -P '(?<=((three)|(one)) )two' <<< "one two three three two one"
grep: lookbehind assertion is not fixed length

grep -P '((?<=(three) )|(?<=(one) ))two' <<< "one two three three two one"
one two three three two one

For processing efficiency PCRE does not support right-to-left matching or recursion. When doing a lookbehind PCRE searches the end of any previous matching string - implementing variable size matches would require recursion and reduce efficiency. See: Look Behind Assertions

回复收藏 0 原文

~没有更多了~