正则表达式的威力有多大?

发布于 2024-09-24 15:01:18 字数 208 浏览 0 评论 0原文

顾名思义,我们可能认为正则表达式只能匹配正则语言。但是我们在实践中使用的正则表达式包含一些我不确定是否可以用理论对应物来实现的东西。例如,您将如何模拟反向引用? 那么问题来了:我们在实践中使用的正则表达式的理论威力是什么?你能想出一种方法来匹配{(a^n)(b^n)|n>=0}吗? {(a^n)(b^n)(c^n)|n>=0} 怎么样?

As the name suggests we may think that regular expressions can match regular languages only. But regular expressions we use in practice contain stuff that I am not sure it's possible to implement with their theoretical counterparts. How for example would you simulate a back-reference?
So the question arises: what is the theoretical power of the regular expressions we use in practice? Can you think of a way to match {(a^n)(b^n)|n>=0}? What about {(a^n)(b^n)(c^n)|n>=0}?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

琴流音 2024-10-01 15:01:19

您所暗示的正则表达式的基本困难是正则表达式没有“记忆”。在最纯粹的形式中,任何真正的正则表达式都不应该能够识别这两种语言中的任何一种。根据定义,任何可以解析此类语言的正则表达式都不是正则表达式。我认为你所说的“我们使用的正则表达式是练习”的意思是扩展正则表达式,这在技术上不是正则表达式。

你的问题的问题在于你要求将专门设计的理论场景应用于实际情况,这几乎总是以灾难告终。

所以我的答案有点不是答案,因为我是说你必须重新表述问题来询问扩展正则表达式才能得到答案。

一些可能对此事有所帮助的资源:

有用的维基百科文章

类似的 StackOverflow 问题

一本关于此主题的好书

我还将我的答案作为社区维基,供其他想要为这一思路做出贡献的人使用。

The basic difficulty with regular expressions that you are hinting at is the fact that regular expressions don't have a "memory" to them. In the purest form, no real regular expression should be able to recognize either of these languages. Any regular expression that could parse these sorts of languages would be, by definition, not regular. I think what you mean by "regular expressions we use is practice" is extended regular expressions, which are not technically regular expressions.

The problem with your question is that you are asking to apply a specifically contrived theoretical scenario to a practical situation, which almost always ends in disaster.

So my answer is sort of a non-answer, in that I am saying you would have to rephrase the question to ask about extended regular expressions for it to have an answer.

A couple of resources that might help in this matter:

Helpful wikipedia article

Similar StackOverflow question

Good book with a chapter on this topic

I'm also making my answer a community wiki for anyone else who wants to contribute to this line of thought.

小鸟爱天空丶 2024-10-01 15:01:18

您的问题的答案是,允许反向引用的“正则表达式”语言既不是常规的也不是上下文无关的。 (换句话说,正如您所指出的,您无法使用常规语言或 CFL 来模拟反向引用。)事实上,维基百科表示我们在实践中使用的许多“正则表达式”语言都是 NP 完全

无界模式匹配
反向引用的数量,如
在众多现代工具的支持下,
NP 完全(参见[11]定理 6.2)。

正如其他人所建议的,计算机语言和库中普遍支持的正则表达式语言与形式语言理论中的正则表达式是不同的。 Larry Wall 写了关于 Perl“正则表达式”,

“正则表达式”[...]仅
与真实常规有一定关系
表达式。尽管如此,该术语
随着我们的能力而成长
模式匹配引擎,所以我不是
将尝试与语言作斗争
这里有必要性。然而我会,
通常称它们为“正则表达式”

你问,

你能想出一个匹配的方法吗?
{(a^n)(b^n)|n>=0}?怎么样
{(a^n)(b^n)(c^n)|n>=0}?

我不确定您是否正在尝试测试理论正则表达式语言是否可以匹配“方块语言”,或者您是否正在寻找(实际)中的实现) 正则表达式 语言。 这是前者不可能的证据;这里有关于 java 正则表达式的后者的详细解释和实现

The answer to your question is, "regular expression" languages that allow back-references are neither regular nor context-free. (In other words, as you pointed out, you cannot simulate back-reference with a regular language, nor with a CFL.) In fact, Wikipedia says many of the "regular expression" languages we use in practice are NP-Complete:

Pattern matching with an unbounded
number of back references, as
supported by numerous modern tools, is
NP-complete (see,[11] Theorem 6.2).

As others have suggested, the regular expression languages commonly supported in computer languages and libraries are a different animal from regular expressions in formal language theory. Larry Wall wrote in regard to Perl "regexes,"

'Regular expressions' [...] are only
marginally related to real regular
expressions. Nevertheless, the term
has grown with the capabilities of our
pattern matching engines, so I'm not
going to try to fight linguistic
necessity here. I will, however,
generally call them "regexes"

You asked,

Can you think of a way to match
{(a^n)(b^n)|n>=0}? What about
{(a^n)(b^n)(c^n)|n>=0}?

I'm not sure here if you're trying to test whether theoretical regular expression languages can match the "language of squares", or whether you're looking for an implementation in a (practical) regex language. Here's the proof why the former is not possible; and here's a long explanation and implementation of the latter for java regexes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文