是否存在一种算法可以确定一种正则语言是否与另一种正则语言匹配的任何输入相匹配?

发布于 2024-09-17 06:14:30 字数 207 浏览 4 评论 0原文

假设我们有正则表达式:

  • Hello W.*rld
  • Hello World
  • .* World
  • .* W.*

我想最大限度地减少匹配任意输入所需的正则表达式的数量。

为此,我需要查找一个正则表达式是否与另一个表达式匹配的任何输入匹配。这可能吗?

比利3

Let's say we have regular expressions:

  • Hello W.*rld
  • Hello World
  • .* World
  • .* W.*

I would like to minimize the number of regexes required to match arbitrary input.

To do that, I need to find if one regular expression matches any input matched by another expression. Is that possible?

Billy3

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

匿名的好友 2024-09-24 06:14:31

任何正则表达式都可以链接到 DFA - 您可以最小化 DFA,并且由于最小形式是唯一的,因此您可以决定两个表达式是否等效。 Dani Cricco 指出了 Hopcroft O(n log n) 算法。 Hopcroft 和 Craft 提出了另一种改进算法,它测试两个 DFA 在 O(n) 中的等价性。

对于这个问题的一个很好的调查和一个有趣的方法,我推荐这篇论文测试常规语言的等价性,来自 arXiv。

稍后编辑:如果您对正则表达式的包含而不是等价感兴趣,我遇到了一篇可能感兴趣的论文:正则表达式的包含问题 - 我只浏览了一下它,但它似乎包含解决该问题的多项式时间算法。

Any regular expression can be linked to a DFA - you can minimize the DFA and since the minimal form is unique, you can decide whether two expressions are equivalent. Dani Cricco pointed out the Hopcroft O(n log n) algorithm. There is another improved algorithm by Hopcroft and Craft which tests the equivalence of two DFAs in O(n).

For a good survey on the matter and an interesting approach to this, I reccomend the paper Testing the Equivalence of Regular Languages, from arXiv.

Later edit: if you are interested in inclusion rather than equivalence for regular expressions, I have come across a paper that might be of interest: Inclusion Problem for Regular Expressions - I have only skimmed through it but it seems to contain a polynomial time algorithm to the problem.

诠释孤独 2024-09-24 06:14:31

当然!。正则表达式可以表示为 FSM(有限状态机),并且从技术上讲,可以识别同一字符串的 FSM 数量是无限的。

同构是描述两个 FSM 是否等价的名称。有几种算法可以最小化 FSM。例如Hopcroft最小化算法可以最小化两个FSM在 n 状态自动机上,时间复杂度为 O(n log n)。

Sure!. A regular expression can be represented as an FSM (Finite State Machine) and there are technically infinite number of FSM that can recognize the same string.

Isomorphism is the name that describes if two FSM are equivalent. There are a couple of algorigthm to minimize an FSM. For example the Hopcroft minimization algorithm can minimize two FSM in O(n log n), on an n state automaton.

醉生梦死 2024-09-24 06:14:31

是的。

两种正则语言的等价问题是可判定的。

算法草图:

  • 最小化两个 DFA,
  • 检查它们是否同构

Yes.

The problem of equivalence of two regular languages is decidable.

Sketch of an algorithm:

  • minimize both DFAs
  • check if they are isomorph
你对谁都笑 2024-09-24 06:14:31

这个问题称为正则表达式的“包含”或“包含”,因为您要求的是一个正则表达式匹配的单词集合是否包含(或包含)另一个正则表达式匹配的单词集合。相等是一个不同的问题,通常意味着两个正则表达式是否完全匹配相同的单词,即它们在功能上是等效的。例如“a*”包含“aa*”,但它们不相等。

所有已知的正则表达式包含算法在最坏的情况下都会花费正则表达式大小的指数时间。但标准算法是这样的:

输入r1和r2
输出 Yes 如果 r1 包含 r2

  1. 创建 DFA(r1) 和 DFA(r2)
  2. 创建 Neg(DFA(r1)) (与 r1 不匹配的单词完全匹配)
  3. 创建 Neg(DFA(r1))x DFA(r2) (与 r1 不匹配的单词完全匹配)正是那些由 Neg(DFA(r1)) and DFA(r2)) 匹配的单词
  4. 检查 3 中创建的自动机与任何单词都不匹配

这可行,因为您要检查的是有没有与 r2 匹配但与 r1 不匹配的单词。

This problem is called "inclusion" or "subsumption" of regular expressions, because what you are asking for, is whether the set of words matched by one regexp includes (or subsumes) the set of words matched by the other regex. Equality is a different question which usually means whether two regexps matches exactly the same words, i.e. that they are functionally equivalent. For example "a*" includes "aa*", while they are not equal.

All known algorithms for regexp inclusion are the worst case take time exponential in the size of the regexp. But the standard algorithm is like this:

Input r1 and r2
Output Yes if r1 includes r2

  1. Create DFA(r1) and DFA(r2)
  2. Create Neg(DFA(r1)) (which matches exactly those words r1 dont match)
  3. Create Neg(DFA(r1))x DFA(r2) (which matches exactly those words matched by Neg(DFA(r1)) and DFA(r2))
  4. Check that the automaton made in 3. does not match any word

This works, since what you are checking is that there are no words matched by r2 that are not matched by r1.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文