生成具有死或多余状态的 DFA 的正则表达式

发布于 2025-01-07 23:36:30 字数 864 浏览 1 评论 0原文

我希望在我的词法分析器中实现 DFA 最小化器,但我似乎无法生成看起来不是表达式的最小 DFA 的 DFA。

我正在根据 NFA 构建 DFA,该 NFA 是使用后缀正则表达式的 thomson 构造构建的。这和龙书里描述的差不多。为了制作词法分析器,使用从起始状态开始的 epsilon 转换来组合多个 NFA。 DFA 算法正是应用在这个组合 NFA 上。

那么,是否有任何“已知”正则表达式可以生成 DFA,从而为死状态消除和状态最小化提供一个很好的测试平台?

我当然可以编写一个奇怪的 DFA 并在其上应用算法,但这并不是一个真正合适的测试用例,不是吗?如果我构建 DFA 的方法不容易出现死状态,那么该信息将同样有价值,因为这样我就可以完全跳过实现状态消除功能。

编辑:如果您需要实现详细信息才能准确回答,可以在 github 上找到代码,特别是 NFA.csDFA.cs 类。此外,我还写了一系列关于 关于我正在使用的构造算法的博客文章,如果有帮助的话。

I'm looking to implement a DFA minimizer in my lexer, but I can't seem to produce a DFA that doesn't look like it's already the minimal DFA for the expression.

I'm constructing the DFA from a NFA that is built using thomson construction from a postfix regular expression. It's pretty much exactly what is being described in the dragon book. To make the lexer several of the NFAs are combined using epsilon transitions from the start state. It's on this combined NFA that the DFA algorithm is applied.

So, is there any "known" regular expression that will generate a DFA which will make a nice test bed for dead state elimination and state minimization?

I could of course just hack up a weird DFA and apply the algorithms on it, but it would not really be a proper test case would it? If it's so that the method I'm constructing DFAs isn't prone to dead states, then that information would be just as valueable, since then I can skip implementing the state elimination feature altogether.

Edit: In case you need implementation details in order to accurately answer, the code is available on github, specifically the NFA.cs and DFA.cs classes. Additionally I wrote a series on blog posts on the construction algorithm I'm using, if that helps.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦里°也失望 2025-01-14 23:36:30

好吧,我以一种完全迂回的方式发现了这一点。我制作了一个用于可视化正则表达式的工具,因为我从解析器中获得了相当不错的调试输出。这恰当地说明了这样一个表达式,即使用标准汤普森构造技术将为您提供一个非常愚蠢的自动机: (a+b+c+)+|abc

在工具中显示:http://regexvisualizer.apphb.com/?Regex=%28a%2Bb%2Bc%2B%29%2B%7Cabc&NfaSize=300&DfaSize=250#

该工具当前执行直接的汤普森结构,没有任何优化。如果删除表达式中完全多余的 |abc 部分,则表达式应保持不变。事实并非如此。

Ok, so I found this out in a totally roundabout way. I made a tool for visualizing regular expression since I got quite a nice debug output from my parser. This aptly illustrates such an expression that using standard thompson construction techniques will give you a pretty stupid automata: (a+b+c+)+|abc

Shown in the tool: http://regexvisualizer.apphb.com/?Regex=%28a%2Bb%2Bc%2B%29%2B%7Cabc&NfaSize=300&DfaSize=250#

This tool currently performs a straight up thompson construction without any optimization. If you remove the |abc part of the expression which is entirely superfluous the expression should stay the same. It doesn't.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文