计算两个无限正则表达式解集是否不相交
计算两个任意正则表达式是否有任何重叠的解决方案(假设可能)。
例如,这两个正则表达式可以通过暴力证明没有交集,因为两个解集是可计算的,因为它是有限的。
^1(11){0,1000}$ ∩ ^(11){0,1000}$ = {}
{1,111, ..., ..111} ∩ {11,1111, ..., ...11} = {}
{} = {}
但是用 *
替换 {0,1000}
消除了暴力解决方案的可能性,因此必须创建更智能的算法。
^1(11)*$ ∩ ^(11)*$ = {}
{1,^1(11)*$} ∩ {^(11)*$} = {}
{1,^1(11)*$} ∩ {11,^11(11)*$} = {}
{1,111,^111(11)*$} ∩ {11,^(11)*$} = {}
.....
在另一个类似问题中,有一个答案是计算交集正则表达式。这可能吗?如果是这样,人们将如何编写算法来完成这样的事情?
我认为这个问题可能是停止问题的领域。
编辑:
我已使用公认的解决方案为示例问题创建 DFA。很容易看出如何在 M_3
的状态图上使用 BFS 或 DFS 来确定 M_3
的最终状态是否可达。
In calculate if two arbitrary regular expressions have any overlapping solutions (assuming it's possible).
For example these two regular expressions can be shown to have no intersections by brute force because the two solution sets are calculable because it's finite.
^1(11){0,1000}$ ∩ ^(11){0,1000}$ = {}
{1,111, ..., ..111} ∩ {11,1111, ..., ...11} = {}
{} = {}
But replacing the {0,1000}
by *
remove the possibility for a brute force solution, so a smarter algorithm must be created.
^1(11)*$ ∩ ^(11)*$ = {}
{1,^1(11)*$} ∩ {^(11)*$} = {}
{1,^1(11)*$} ∩ {11,^11(11)*$} = {}
{1,111,^111(11)*$} ∩ {11,^(11)*$} = {}
.....
In another similar question one answer was to calculate the intersection regex. Is that possible to do? If so how would one write an algorithm to do such a thing?
I think this problem might be domain of the halting problem.
EDIT:
I've used the accepted solution to create the DFAs for the example problem. It's fairly easy to see how you can use a BFS or DFS on the graph of states for M_3
to determine if a final state from M_3
is reachable.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
它不属于停机问题的范围;判断正则语言的交集是否为空可以如下解决:
这些事情中的每一个都可以通过算法完成和/或检查。此外,自然地,一旦 DFA 识别出您的语言的交集,您就可以构建一个正则表达式来匹配该语言。如果您从正则表达式开始,您就可以制作 DFA。这绝对是可以计算的。
编辑:
因此要构建笛卡尔积机,您需要两个 DFA。令 M1 = (E, q0, Q1, A1, f1) 且 M2 = (E, q0', Q2, A2, f2)。在这两种情况下,E 是输入字母表,q0 是起始状态,Q 是所有状态的集合,A 是接受状态的集合,f 是转移函数。构造 M3,其中...
如果我没有犯任何错误,L(M3) = L( M1) 与 L(M2) 相交。整洁吧?
It is not in the domain of the halting problem; deciding whether the intersection of regular languages is empty or not can be solved as follows:
Each of those things can be algorithmically done and/or checked. Also, naturally, once you have a DFA recognizing the intersection of your languages, you can construct a regex to match the language. And if you start out with a regex, you can make a DFA. This is definitely computable.
EDIT:
So to build a Cartesian Product Machine, you need two DFAs. Let M1 = (E, q0, Q1, A1, f1) and M2 = (E, q0', Q2, A2, f2). In both cases, E is the input alphabet, q0 is the start state, Q is the set of all states, A is the set of accepting states, and f is the transition function. Construct M3 where...
Provided I didn't make any mistakes, L(M3) = L(M1) intersect L(M2). Neat, huh?
我创建了 Patrick87 答案的 PHP 实现。除了通过笛卡尔积机实现交集之外,我还实现了一种替代算法,使用
这对于 DFA 来说非常有效,因为完全定义的 DFA(定义了每个可能的过渡状态)的否定只是将所有非最终状态添加到最终状态集中,并从最终状态集中删除所有当前的最终状态(非最终状态)。 -最终->最终,最终->非最终)。通过将 DFA 转换为 NFA,然后创建一个新的起始节点,通过 lambda 变换连接联合 DFA 的旧起始节点,可以轻松完成 DFA 的并集。
除了解决交集问题之外,我创建的库还能够将 NFA 确定为 DFA,并且将正则表达式转换为 NFA。
编辑:
我创建了一个 webapp ,它允许使用我从这个问题(和其他问题)中学到的知识对正则表达式语言进行这种转换。
I've created a PHP implementation of Patrick87 answer. In addition to implementing the Intersection via Cartesian Product Machine, I've also implemented an alterative algorithm for finding Intersections of DFAs using De Morgan.
This works very well for DFAs as the negation of a fully defined DFA (those with every possible transition state defined) is just to add all non-final states to the final state set and remove all current final states from the final state set (non-final -> final, final -> non->final). Union of DFA can be done easily by turning them into a NFA and then creating a new starting node that connects the unioned DFA's old start nodes by lambda transforms.
In addition to solving the intersection problem, the library I created is also able to determinize a NFA to a DFA and convert Regex to NFA.
EDIT:
I have created a webapp that allows this sort of transformations on regex languagues using what I learned form this question (and others).