生成具有死或多余状态的 DFA 的正则表达式
我希望在我的词法分析器中实现 DFA 最小化器,但我似乎无法生成看起来不是表达式的最小 DFA 的 DFA。
我正在根据 NFA 构建 DFA,该 NFA 是使用后缀正则表达式的 thomson 构造构建的。这和龙书里描述的差不多。为了制作词法分析器,使用从起始状态开始的 epsilon 转换来组合多个 NFA。 DFA 算法正是应用在这个组合 NFA 上。
那么,是否有任何“已知”正则表达式可以生成 DFA,从而为死状态消除和状态最小化提供一个很好的测试平台?
我当然可以编写一个奇怪的 DFA 并在其上应用算法,但这并不是一个真正合适的测试用例,不是吗?如果我构建 DFA 的方法不容易出现死状态,那么该信息将同样有价值,因为这样我就可以完全跳过实现状态消除功能。
编辑:如果您需要实现详细信息才能准确回答,可以在 github 上找到代码,特别是 NFA.cs 和 DFA.cs 类。此外,我还写了一系列关于 关于我正在使用的构造算法的博客文章,如果有帮助的话。
I'm looking to implement a DFA minimizer in my lexer, but I can't seem to produce a DFA that doesn't look like it's already the minimal DFA for the expression.
I'm constructing the DFA from a NFA that is built using thomson construction from a postfix regular expression. It's pretty much exactly what is being described in the dragon book. To make the lexer several of the NFAs are combined using epsilon transitions from the start state. It's on this combined NFA that the DFA algorithm is applied.
So, is there any "known" regular expression that will generate a DFA which will make a nice test bed for dead state elimination and state minimization?
I could of course just hack up a weird DFA and apply the algorithms on it, but it would not really be a proper test case would it? If it's so that the method I'm constructing DFAs isn't prone to dead states, then that information would be just as valueable, since then I can skip implementing the state elimination feature altogether.
Edit: In case you need implementation details in order to accurately answer, the code is available on github, specifically the NFA.cs and DFA.cs classes. Additionally I wrote a series on blog posts on the construction algorithm I'm using, if that helps.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,我以一种完全迂回的方式发现了这一点。我制作了一个用于可视化正则表达式的工具,因为我从解析器中获得了相当不错的调试输出。这恰当地说明了这样一个表达式,即使用标准汤普森构造技术将为您提供一个非常愚蠢的自动机:
(a+b+c+)+|abc
在工具中显示:http://regexvisualizer.apphb.com/?Regex=%28a%2Bb%2Bc%2B%29%2B%7Cabc&NfaSize=300&DfaSize=250#
该工具当前执行直接的汤普森结构,没有任何优化。如果删除表达式中完全多余的
|abc
部分,则表达式应保持不变。事实并非如此。Ok, so I found this out in a totally roundabout way. I made a tool for visualizing regular expression since I got quite a nice debug output from my parser. This aptly illustrates such an expression that using standard thompson construction techniques will give you a pretty stupid automata:
(a+b+c+)+|abc
Shown in the tool: http://regexvisualizer.apphb.com/?Regex=%28a%2Bb%2Bc%2B%29%2B%7Cabc&NfaSize=300&DfaSize=250#
This tool currently performs a straight up thompson construction without any optimization. If you remove the
|abc
part of the expression which is entirely superfluous the expression should stay the same. It doesn't.