正则表达式的威力有多大?
顾名思义,我们可能认为正则表达式只能匹配正则语言。但是我们在实践中使用的正则表达式包含一些我不确定是否可以用理论对应物来实现的东西。例如,您将如何模拟反向引用? 那么问题来了:我们在实践中使用的正则表达式的理论威力是什么?你能想出一种方法来匹配{(a^n)(b^n)|n>=0}
吗? {(a^n)(b^n)(c^n)|n>=0}
怎么样?
As the name suggests we may think that regular expressions can match regular languages only. But regular expressions we use in practice contain stuff that I am not sure it's possible to implement with their theoretical counterparts. How for example would you simulate a back-reference?
So the question arises: what is the theoretical power of the regular expressions we use in practice? Can you think of a way to match {(a^n)(b^n)|n>=0}
? What about {(a^n)(b^n)(c^n)|n>=0}
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您所暗示的正则表达式的基本困难是正则表达式没有“记忆”。在最纯粹的形式中,任何真正的正则表达式都不应该能够识别这两种语言中的任何一种。根据定义,任何可以解析此类语言的正则表达式都不是正则表达式。我认为你所说的“我们使用的正则表达式是练习”的意思是扩展正则表达式,这在技术上不是正则表达式。
你的问题的问题在于你要求将专门设计的理论场景应用于实际情况,这几乎总是以灾难告终。
所以我的答案有点不是答案,因为我是说你必须重新表述问题来询问扩展正则表达式才能得到答案。
一些可能对此事有所帮助的资源:
有用的维基百科文章
类似的 StackOverflow 问题
一本关于此主题的好书
我还将我的答案作为社区维基,供其他想要为这一思路做出贡献的人使用。
The basic difficulty with regular expressions that you are hinting at is the fact that regular expressions don't have a "memory" to them. In the purest form, no real regular expression should be able to recognize either of these languages. Any regular expression that could parse these sorts of languages would be, by definition, not regular. I think what you mean by "regular expressions we use is practice" is extended regular expressions, which are not technically regular expressions.
The problem with your question is that you are asking to apply a specifically contrived theoretical scenario to a practical situation, which almost always ends in disaster.
So my answer is sort of a non-answer, in that I am saying you would have to rephrase the question to ask about extended regular expressions for it to have an answer.
A couple of resources that might help in this matter:
Helpful wikipedia article
Similar StackOverflow question
Good book with a chapter on this topic
I'm also making my answer a community wiki for anyone else who wants to contribute to this line of thought.
您的问题的答案是,允许反向引用的“正则表达式”语言既不是常规的也不是上下文无关的。 (换句话说,正如您所指出的,您无法使用常规语言或 CFL 来模拟反向引用。)事实上,维基百科表示我们在实践中使用的许多“正则表达式”语言都是 NP 完全:
正如其他人所建议的,计算机语言和库中普遍支持的正则表达式语言与形式语言理论中的正则表达式是不同的。 Larry Wall 写了关于 Perl“正则表达式”,
你问,
我不确定您是否正在尝试测试理论正则表达式语言是否可以匹配“方块语言”,或者您是否正在寻找(实际)中的实现) 正则表达式 语言。 这是前者不可能的证据; 和 这里有关于 java 正则表达式的后者的详细解释和实现。
The answer to your question is, "regular expression" languages that allow back-references are neither regular nor context-free. (In other words, as you pointed out, you cannot simulate back-reference with a regular language, nor with a CFL.) In fact, Wikipedia says many of the "regular expression" languages we use in practice are NP-Complete:
As others have suggested, the regular expression languages commonly supported in computer languages and libraries are a different animal from regular expressions in formal language theory. Larry Wall wrote in regard to Perl "regexes,"
You asked,
I'm not sure here if you're trying to test whether theoretical regular expression languages can match the "language of squares", or whether you're looking for an implementation in a (practical) regex language. Here's the proof why the former is not possible; and here's a long explanation and implementation of the latter for java regexes.