正则表达式等价
有没有办法找出两个任意正则表达式是否等价? 对我来说看起来很复杂的问题,但可能有一些 DFA 简化机制之类的?
Is there a way to find out if two arbitrary regular expressions are equivalent? Looks like complex problem to me, but there might be some DFA simplification mechanism or something?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
要测试等价性,您可以计算表达式的最小 DFA,比较它们。
To test equivalence you can compute the minimal DFAs for the expressions and compare them.
等式的可测试性是正则表达式的经典属性之一。 (注意,如果您真正谈论的是 Perl 正则表达式或其他一些技术上非正则超级语言,则这并不成立。)
将您的 RE 转换为广义有限自动机 A 和 B,然后构造一个新的自动机 AB,例如A 的接受状态到 B 的起始状态有空转换,并且 B 的接受状态被反转。 这给你一个自动机,它接受 A 接受的所有字符串,B 接受的所有字符串除外。
对 BA 执行相同的操作,并将两者都简化为纯 FA。 如果 FA 没有可从起始状态访问的接受状态,则它接受空语言。 如果你能证明 AB 和 BA 都是空的,那么你就证明了 A = B。
编辑
嘿,我不敢相信没有人注意到那里的巨大错误 - 一个故意的错误course :-p所描述的自动机 AB 将接受前半部分被 A 接受而后半部分不被 B 接受的字符串。构建所需的 AB 是一个稍微棘手的过程。 我无法立即想到它,但我确实知道它是明确定义的(并且可能涉及创建状态来表示 A 中接受状态和 B 中不接受状态的产物)。
Testability of equality is one of the classical properties of regular expressions. (N.B. This doesn't hold if you're really talking about Perl regular expressions or some other technically nonregular superlanguage.)
Turn your REs to generalised finite automata A and B, then construct a new automaton A-B such that the accepting states of A have null transitions to the start states of B, and that the accepting states of B are inverted. This gives you an automaton that accepts all those strings accepted by A, except for all those accepted by B.
Do the same for B-A, and reduce both to pure FAs. If an FA has no accepting states accessible from a start state then it accepts the empty language. If you can show that both A-B and B-A are empty, you've shown that A = B.
Edit
Heh, I can't believe no one noticed the gigantic error there -- an intentional one, of course :-pThe automata A-B as described will accept those strings whose first half is accepted by A and whose second half is not accepted by B. Building the desired A-B is a slightly trickier process. I can't think of it off the top of my head, but I do know it's well-defined (and likely involves creating states to the represent the products of accepting states in A and non-accepting states in B).
这实际上取决于您所说的正则表达式的含义。 正如其他发帖者指出的那样,将两个表达式减少到最小 DFA 应该可行,但它仅适用于纯正则表达式。
现实世界的正则表达式库中使用的一些构造(特别是反向引用)使它们能够表达不规则的语言,因此 DFA 算法不适用于它们。 例如,正则表达式:
([az]*) \1
匹配由空格分隔的同一单词的两次出现(a a
和b b
但不是b a
也不是a b
)。 有限自动机根本无法识别这一点。This really depends on what you mean by regular expressions. As the other posters pointed out, reducing both expressions to their minimal DFA should work, but it only works for the pure regular expressions.
Some of the constructs used in the real world regex libs (backreferences in particular) give them power to express languages that aren't regular, so the DFA algorithm won't work for them. For example the regex :
([a-z]*) \1
matches a double occurence of the same word separated by a space (a a
andb b
but notb a
nora b
). This cannot be recognized by a finite automaton at all.这两个 Perlmonks 线程讨论了这个问题(具体来说,请阅读 blokhead 的回复):
These two Perlmonks threads discuss this question (specifically, read blokhead's responses):