是“正则表达式”吗? 现代编程语言中真的有“上下文相关语法”吗?
多年来,“正则表达式”模式匹配变得越来越强大,以至于我想知道:它真的只是上下文相关的语法匹配吗? 它是上下文无关语法匹配的变体/扩展吗? 它现在在哪里,为什么我们不直接称呼它而不是旧的、限制性的“正则表达式”呢?
Over the years, "regex" pattern matching has been getting more and more powerful to the point where I wonder: is it really just context-sensitive-grammar matching? Is it a variation/extension of context-free-grammar matching? Where is it right now and why don't we just call it that instead of the old, restrictive "regular expression"?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
特别是捕获括号的反向引用使正则表达式比常规、上下文无关或上下文相关的语法更复杂。 这个名字只是历史悠久的(就像很多词一样)。 另请参阅 Wikipedia 中的本节和此来自 Perl 的示例说明。
In particular backreferences to capturing parentheses make regular expressions more complex than regular, context-free, or context-sensitive grammars. The name is simply historically grown (as many words). See also this section in Wikipedia and this explanation with an example from Perl.
我的看法是:
待匹配语法中的“location”:无法实现递归
全部大多数人类语言我所知道的大多数人类语言都具有正则表达式解析器,它们允许您匹配解析器已经遇到的内容,从而实现类似于上下文相关语法的功能。
尽管如此,正则表达式解析器无论多么复杂,都不允许递归应用规则,而这是上下文无关语法的明确要求。
在我看来,术语正则表达式主要指的是用于表达这些常规语法(星号和问号)的语法。
The way I see it:
"location" in the grammar to be matched: Recursion cannot be implemented
AllMost human languagesI do know of regular expression parsers that allow you to match against something the parser has already encountered, achieving something like a context-sensitive grammar.
Still, regular expression parsers, however sophisticated they may be, don't allow for recursive application of rules, which is a definite requirement for context-free grammars.
The term regex, in my opinion, mostly refers to the syntax used to express those regular grammars (the stars and question marks).
现代正则表达式实现中的某些功能违反了经典正则表达式定义。
例如 Microsoft 的 .NET 平衡组
(?<< /code>
name1
-
name2
> … )
:
这确实匹配语言L₀₁ = {ε, 01, 0011, 000111, … }。 但根据 Pumping Lemma,这种语言是不规则的。
There are features in modern regular expression implementations that break the rules of the classic regular expression definition.
For example Microsoft’s .NET Balancing Group
(?<
name1
-
name2
> … )
:This does match the language L₀₁ = {ε, 01, 0011, 000111, … }. But this language is not regular according to the Pumping Lemma.