计算两个无限正则表达式解集是否不相交

发布于 2024-12-09 11:28:44 字数 1063 浏览 1 评论 0原文

计算两个任意正则表达式是否有任何重叠的解决方案(假设可能)。

例如,这两个正则表达式可以通过暴力证明没有交集,因为两个解集是可计算的,因为它是有限的。

^1(11){0,1000}$ ∩     ^(11){0,1000}$        = {}
{1,111, ..., ..111} ∩ {11,1111, ..., ...11} = {}
{}                                          = {}

但是用 * 替换 {0,1000} 消除了暴力解决方案的可能性,因此必须创建更智能的算法。

^1(11)*$ ∩ ^(11)*$ = {}
{1,^1(11)*$} ∩ {^(11)*$} = {}
{1,^1(11)*$} ∩ {11,^11(11)*$} = {}
{1,111,^111(11)*$} ∩ {11,^(11)*$} = {}
.....

在另一个类似问题中,有一个答案是计算交集正则表达式。这可能吗?如果是这样,人们将如何编写算法来完成这样的事情?

我认为这个问题可能是停止问题的领域。

编辑:

我已使用公认的解决方案为示例问题创建 DFA。很容易看出如何在 M_3 的状态图上使用 BFS 或 DFS 来确定 M_3 的最终状态是否可达。

DFA 解决方案

In calculate if two arbitrary regular expressions have any overlapping solutions (assuming it's possible).

For example these two regular expressions can be shown to have no intersections by brute force because the two solution sets are calculable because it's finite.

^1(11){0,1000}$ ∩     ^(11){0,1000}$        = {}
{1,111, ..., ..111} ∩ {11,1111, ..., ...11} = {}
{}                                          = {}

But replacing the {0,1000} by * remove the possibility for a brute force solution, so a smarter algorithm must be created.

^1(11)*$ ∩ ^(11)*$ = {}
{1,^1(11)*$} ∩ {^(11)*$} = {}
{1,^1(11)*$} ∩ {11,^11(11)*$} = {}
{1,111,^111(11)*$} ∩ {11,^(11)*$} = {}
.....

In another similar question one answer was to calculate the intersection regex. Is that possible to do? If so how would one write an algorithm to do such a thing?

I think this problem might be domain of the halting problem.

EDIT:

I've used the accepted solution to create the DFAs for the example problem. It's fairly easy to see how you can use a BFS or DFS on the graph of states for M_3 to determine if a final state from M_3 is reachable.

DFA solution

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

绮筵 2024-12-16 11:28:44

它不属于停机问题的范围;判断正则语言的交集是否为空可以如下解决:

  1. 为第一种语言构造一个DFA M1。
  2. 构建第二语言的 DFA M2。 提示:克林定理和幂集机器构造
  3. 为 M1 与 M2 相交构造 DFA M3。 提示:笛卡尔积机器构造
  4. 判断L(M3)是否为空。 提示:如果 M3 有 n 个状态,并且 M3 不接受任何长度不大于 n 的字符串,则 L(M3) 为空...为什么?

这些事情中的每一个都可以通过算法完成和/或检查。此外,自然地,一旦 DFA 识别出您的语言的交集,您就可以构建一个正则表达式来匹配该语言。如果您从正则表达式开始,您就可以制作 DFA。这绝对是可以计算的。

编辑:

因此要构建笛卡尔积机,您需要两个 DFA。令 M1 = (E, q0, Q1, A1, f1) 且 M2 = (E, q0', Q2, A2, f2)。在这两种情况下,E 是输入字母表,q0 是起始状态,Q 是所有状态的集合,A 是接受状态的集合,f 是转移函数。构造 M3,其中...

  1. E3 = E
  2. Q3 = Q1 x Q2(有序对)
  3. q0'' = (q0, q0')
  4. A3 = {(x, y) | A1 中的 x 和 A2 中的 y}
  5. f3(s, (x, y)) = (f1(s, x), f2(s, y))

如果我没有犯任何错误,L(M3) = L( M1) 与 L(M2) 相交。整洁吧?

It is not in the domain of the halting problem; deciding whether the intersection of regular languages is empty or not can be solved as follows:

  1. Construct a DFA M1 for the first language.
  2. Construct a DFA M2 for the second language. Hint: Kleene's Theorem and Power Set machine construction
  3. Construct a DFA M3 for M1 intersect M2. Hint: Cartesian Product Machine construction
  4. Determine whether L(M3) is empty. Hint: If M3 has n states, and M3 doesn't accept any strings of length no greater than n, then L(M3) is empty... why?

Each of those things can be algorithmically done and/or checked. Also, naturally, once you have a DFA recognizing the intersection of your languages, you can construct a regex to match the language. And if you start out with a regex, you can make a DFA. This is definitely computable.

EDIT:

So to build a Cartesian Product Machine, you need two DFAs. Let M1 = (E, q0, Q1, A1, f1) and M2 = (E, q0', Q2, A2, f2). In both cases, E is the input alphabet, q0 is the start state, Q is the set of all states, A is the set of accepting states, and f is the transition function. Construct M3 where...

  1. E3 = E
  2. Q3 = Q1 x Q2 (ordered pairs)
  3. q0'' = (q0, q0')
  4. A3 = {(x, y) | x in A1 and y in A2}
  5. f3(s, (x, y)) = (f1(s, x), f2(s, y))

Provided I didn't make any mistakes, L(M3) = L(M1) intersect L(M2). Neat, huh?

无人问我粥可暖 2024-12-16 11:28:44

我创建了 Patrick87 答案的 PHP 实现。除了通过笛卡尔积机实现交集之外,我还实现了一种替代算法,使用

Intersection( DFA_1, DFA_2 ) === ! UNION( ! DFA_1, ! DFA_2 )

* ! is defined as negation

这对于 DFA 来说非常有效,因为完全定义的 DFA(定义了每个可能的过渡状态)的否定只是将所有非最终状态添加到最终状态集中,并从最终状态集中删除所有当前的最终状态(非最终状态)。 -最终->最终,最终->非最终)。通过将 DFA 转换为 NFA,然后创建一个新的起始节点,通过 lambda 变换连接联合 DFA 的旧起始节点,可以轻松完成 DFA 的并集。

除了解决交集问题之外,我创建的库还能够将 NFA 确定为 DFA,并且将正则表达式转换为 NFA。

编辑:

我创建了一个 webapp ,它允许使用我从这个问题(和其他问题)中学到的知识对正则表达式语言进行这种转换。

I've created a PHP implementation of Patrick87 answer. In addition to implementing the Intersection via Cartesian Product Machine, I've also implemented an alterative algorithm for finding Intersections of DFAs using De Morgan.

Intersection( DFA_1, DFA_2 ) === ! UNION( ! DFA_1, ! DFA_2 )

* ! is defined as negation

This works very well for DFAs as the negation of a fully defined DFA (those with every possible transition state defined) is just to add all non-final states to the final state set and remove all current final states from the final state set (non-final -> final, final -> non->final). Union of DFA can be done easily by turning them into a NFA and then creating a new starting node that connects the unioned DFA's old start nodes by lambda transforms.

In addition to solving the intersection problem, the library I created is also able to determinize a NFA to a DFA and convert Regex to NFA.

EDIT:

I have created a webapp that allows this sort of transformations on regex languagues using what I learned form this question (and others).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文