从正则表达式生成图灵机的算法
我正在开发一个软件来从正则表达式生成图灵机。换句话说,我想将正则表达式作为输入,并以编程方式生成图灵机来执行相同的任务。
这就是我所做的:
我对正则表达式进行了建模,如下所示:
RegularExpression
(接口):下面的类实现了此接口Simple
(即:aaa
、bbb
、abcde
):这是一个叶类;它没有任何子表达式ComplexWithoutOr
(即:a(ab)*
,(a(ab)*c(b)*)*< /code>):此类包含
RegularExpression
列表。ComplexWithOr
(例如:a(a|b)*
、(a((ab)*|c(b)*)*
):此类包含一个Or
操作,其中包含一个RegularExpression
列表,它代表第一个示例的a|b
部分和。第二个的(ab)*|c(b)*
。变量
(例如:awcw
,其中 w ∈ {a,b)。 }*):这尚未实现,但其想法是将其建模为叶类,其逻辑与Simple
不同,它代表示例的w
部分。 .
当谈到图灵机(TM)生成时,我有不同程度的复杂性:
简单
:这种类型的表达式已经可以工作了。为每个字母生成一个新状态并向右移动。如果在任何状态下,读取的字母不是预期的,它就会启动一个“回滚电路”,以 TM 头处于初始位置结束并停止在非最终状态。
ComplexWithoutOr
:我的问题来了。该算法为每个子表达式生成一个 TM 并将它们连接起来。这适用于一些简单的情况,但我在回滚机制方面遇到问题。
问题描述
以下是我的算法失败的示例输入:
(ab)*abac
:这是一个包含 ComplexWithOr
表达式的 ComplexWithoutOr
表达式(ab)*
(内部有一个 Simple
表达式:ab
)和一个 Simple
表达式 abac
我的算法首先为 ab
生成 TM1。此 TM1 被 TM2 用于 (ab)*
,因此如果 TM1 成功,则进入再次在 TM1 中,否则 TM1 回滚并且 TM2 成功完成。换句话说,TM2 不会失败。
然后,它为 abac
生成 TM3。 TM2 的输出是TM3 的输入。 TM3 的输出是执行结果。
现在,假设输入字符串:abac
。正如您所看到的,它与正则表达式匹配。那么让我们看看TM执行时会发生什么。
TM1 第一次执行成功:ab
。 TM1 第二次 ac
失败并回滚,将 TM 头置于 a
处的第 3rd 位置>。 TM2 成功完成,并将输入转发至 TM3。 TM3 失败,因为 ac
与 abac
不匹配。因此 TM 无法识别 abac
。
这是一个问题。怎么解决这个问题呢?
我使用Java来开发它,但它的语言并不重要。我的问题涉及算法。
I'm developing a software to generate a Turing Machine from a regular expression. In other words, I want to take a regular expression as input, and programmatically generate a Turing Machine to perform the same task.
This is what I have done:
I've modelled the regular expression as follows:
RegularExpression
(interface): the classes below implements this interfaceSimple
(ie:aaa
,bbb
,abcde
): this is a leaf class; it does not have any subexpressionsComplexWithoutOr
(ie:a(ab)*
,(a(ab)*c(b)*)*
): this class contains a list ofRegularExpression
.ComplexWithOr
(exampe:a(a|b)*
,(a((ab)*|c(b)*)*
): this class contains anOr
operation, which contains a list ofRegularExpression
. It represents thea|b
part of the first example and the(ab)*|c(b)*
of the second one.Variable
(example:awcw
, where w ∈ {a,b}*): this is not yet implemented, but the idea is to model it as a leaf class with some different logic fromSimple
. It represents thew
part of the examples.
When it comes to Turing machine (TM) generation, I have different levels of complexity:
Simple
: this type of expression is already working. Generates a new state for each letter and moves right. If in any state, the letter read is not the expected, it starts a "rollback circuit" that finishes with the TM head in the initial position and stops in a not final state.
ComplexWithoutOr
: here comes my problem. The algorithm generates a TM for each subexpression and concatenates them. This works for some simple cases, but I have problems with the rollback mechanism.
Problem description
Here is an example input for which my algorithm fails:
(ab)*abac
: this is a ComplexWithoutOr
expression that contains a ComplexWithOr
expression (ab)*
(that has a Simple
expression inside: ab
) and a Simple
expression abac
My algorithm generates first an TM1 for ab
. This TM1 is used by TM2 for (ab)*
, so if TM1 succeeds, it enters again in TM1, otherwise TM1 rolls back and TM2 finishes with success. In other words, TM2 cannot fail.
Then, it generates a TM3 for abac
. The output of TM2 is the input of TM3. The output of TM3 is the result of the execution.
Now, suppose this input string: abac
. As you can see it matches with the regular expression. So let's see what happens when the TM is executed.
TM1 executes with success the first time: ab
. TM1 fails the second time for ac
and rolls back, putting the TM head in the 3rd position at a
. TM2 finishes with success and the input is forwarded to TM3. TM3 fails, because ac
doesn't match abac
. So the TM does not recognize abac
.
This is a problem. How to solve this?
I'm using Java to develop it, but the language it is not important. My question concerns the algorithm.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我并不完全清楚你到底想实现什么。看起来您想要制作一个图灵机(或一般的任何 FSM),它只接受正则表达式也接受的那些字符串。实际上,您希望将正则表达式转换为 FSM。
实际上,这正是真正的正则表达式匹配器在幕后所做的事情。我认为 Russ Cox 的这系列文章涵盖了您想做的很多事情。
It is not entirely clear to me what exactly you are trying to implement. It looks like you want to make a Turing Machine (or any FSM in general) that accepts only those strings that are also accepted by the regular expression. In effect, you want to convert a regular expression to a FSM.
Actually that is exactly what a real regex matcher does under the hood. I think this series of articles by Russ Cox covers a lot of what you want to do.
Michael Sipser 在计算理论简介中,在第 1 章中证明了正则表达式的描述能力相当于有限自动机。证明的一部分涉及构建一个非确定性有限自动机 (NDFA),该自动机可识别特定正则表达式所描述的语言。我不打算复制该章的一半,由于使用的符号,这会非常困难,所以我建议你借用或购买这本书(或者使用这些术语进行谷歌搜索可能会找到类似的证明)并使用它证明作为算法的基础。
由于图灵机可以模拟 NDFA,我认为生成 NDFA 的算法就足够好了。
Michael Sipser, in Introduction to the Theory of Computation, proves in chapter 1 that regular expressions are equivalent to finite automata in their descriptive power. Part of the proof involves constructing a nondeterministic finite automaton (NDFA) that recognizes the language described by a specific regular expression. I'm not about to copy half that chapter, which would be quite hard due to the notation used, so I suggest you borrow or purchase the book (or perhaps a Google search using these terms will turn up a similar proof) and use that proof as the basis for your algorithm.
As Turing machines can simulate an NDFA, I assume an algorithm to produce an NDFA is good enough.
在乔姆斯基层次结构中,正则表达式是 Level3,而 TM 是 Level1。这意味着 TM 可以生成任何正则表达式,但反之则不然。
in the chomsky hierarchy a regex is Level3, whereas a TM is Level1. this means, that a TM can produce any regex, but not vice versa.