无界向后查找的理论含义是什么?
大多数语言都允许固定长度或有限长度的lookbehind。一个值得注意的例外是 .NET,它允许使用 * 运算符。
但是,.NET 正则表达式已经可以使用命名捕获来识别平衡括号,这不是常规语言。正则表达式在后向查找中是否仍然带有 * ?除 * 之外的子表达式的扩展答案(例如,额外的环视!)也将受到赞赏。
tl;dr:正则表达式在回溯中是否与 * 保持常规?
Most languages allow fixed-length or finite-length lookbehind. One notable exception is .NET, which allows the use of the * operator.
However, .NET regexs can already recognize balanced parentheses using named capture, which is not a regular language. Are regexs still regular with * in lookbehind? Extended answers for subexpressions other than * (for example, additional lookaround!) would also be appreciated.
tl;dr: Do regexs stay regular with * in lookbehind?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我相信这里的答案: Lookaround 是否会影响正则表达式可以匹配哪些语言? 可以扩展以证明在lookbehind 中添加*(甚至嵌套这样的lookbehind 和lookaheads)不会影响表达式的“正则性”。不过我还没有考虑更多。
希望有帮助!
I believe the answer here: Does lookaround affect which languages can be matched by regular expressions? can be extended to prove that adding * in lookbehind (or even nesting such lookbehinds and lookaheads) does not affect the 'regularness' of the expressions. I haven't put more thought into it though.
Hope that helps!
.NET 的无限回顾只是对已经非常规功能的改进:固定、有限或无限,回顾在常规语法中没有地位。前瞻、捕获组、反向引用、不情愿的量词、所有格量词、原子组、条件、词边界、锚点……
如果我们必须将自己限制在理论上纯正则表达式,那么当前 99.9% 的正则表达式用户将没有用处对于他们来说。询问某个功能是否“常规”简直是浪费口舌。它有用吗?这才是最重要的。
.NET's unbounded lookbehind is merely a refinement of an already non-regular feature: fixed, finite or infinite, lookbehinds have no place in a regular grammar. Nor do lookaheads, capturing groups, backreferences, reluctant quantifiers, possessive quantifiers, atomic groups, conditionals, word boundaries, anchors...
If we had to limit ourselves to theoretically-pure regular expressions, 99.9% of current regex users would have no use for them. Asking if a feature is "regular" is a waste of breath; is it useful? That's all that matters.
正则表达式在交集下是封闭的。添加新符号&并重写lookbehind:
A(?
B 可以明确使用任何不超出 A/C 边界的内容。也就是说,除了前瞻之外的任何内容。如果后向查找可能使用先行查找,或者反之亦然,会发生什么情况?开始工作 .*BC 。你还是很好。
因此,正则表达式确实可以添加交集和无限长度环视(可以包括对任何深度的更多环视),并且它仍然一样高效。
Regular expressions are closed under intersection. Add a new symbol & and rewrite the lookbehind:
A(?<B)C as
(?:AC&.*BC), and we get that lookbehind is regular.
B can include clearly use anything that doesn't go past the A/C boundry. That is, anything except lookahead. What happens if lookbehind may use lookahead, or vice-versa? Start work on .*BC . You're still fine.
So, regular expressions could really add in intersection and infinite-length lookaround (which can include more lookaround to any depth) and it would still be just as efficient.