软件开发中的非确定性有限状态机?
最近我一直在思考有限状态机(FSM),以及如何在软件中实现它们(编程语言并不重要)。
我的理解是,确定性状态机被广泛使用(解析/词法分析器、编译器等),但是非确定性状态机有什么问题吗?
我知道可以将所有非确定性状态机转换为确定性状态机(甚至以编程方式)。 这不是我的重点。 我还认为非确定性状态机的实现要复杂得多。
无论如何,实现非确定性状态机是否有意义? 有什么我不知道的特殊应用吗? 这样做的原因可能是什么? 也许优化和专门的非确定性状态机更快?
Recently I've been thinking about finite state machines (FSMs), and how I would implement them in software (programming language doesn't matter).
My understanding is that deterministic state machines are in widespread use (parses/lexers, compilers and so on), but what's the matter with non-deterministic state machines?
I know that is possible to convert all non-deterministic state machines to deterministic ones (even programmatically). That's not my point. I also imagine that non-deterministic state machines are much more complicated to implement.
Anyway, does it make any sense to implement a non-deterministic state machine? Are there any special applications I don't know about?
What could be the reasons to do that? Maybe optimized and specialized non-deterministic state machines are faster?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
大多数正则表达式引擎使用非确定性自动机,因为它们提供了更大的灵活性。 DFA 受到的限制要多得多。 看看一些实现,你就会明白这一点。 Microsoft 甚至在 .NET 正则表达式 类:
匹配行为(第一段)- 本文还提供了使用 NFA 而不是效率更高的 DFA 的理由。
Most regular expression engines use non-deterministic automata since they offer much greater flexibility. DFAs are much more restricted. Have a look at some implementations and you'll see this. Microsoft even underlines this fact in their documentation of the .NET Regex class:
Matching behavior (first paragraph) – this article also offers a rationale for the employment of an NFA rather than the more efficient DFA.
如您所知,NFA 和 DFA 在计算上是等效的。 这是自动机理论中最早的定理之一。 有一些算法可以将一种算法转换为另一种算法(与下推或图灵机不同)。
所以。 为什么一个比另一个? 因为用 NFA 表示给定问题比等效的 DFA 容易得多。
编辑:就机器的实际计算而言,DFA 将会运行得更快,因为它们不必回溯。 但它们需要更多的内存来表示。 (Mem 与 CPU 权衡)
As you know, NFAs and DFAs are computationally equivalent. It's one of the first theorems in automata theory. There are algorithms to convert one to another(unlike Pushdown or turing machines).
So. Why one over the other? Because representation of a given problem with a NFA is far easier than the equivalent DFA.
edit: in terms of actually computing the machine, DFAs are going to go faster because they don't have to backtrack. But they will take more memory to represent. (Mem vs CPU tradeoff)
我的建议=看看Adrian Thurstons Ragel的手册。
有一些简单的方法可以直接生成 DFA,但我相信它们只支持有限范围的运算符 - 基本上是 EBNF 通常的嫌疑人。 Ragel 使用非确定性方法将较简单的自动机组成复杂的自动机,然后使用 epsilon 消除和最小化来创建高效的确定性自动机。 无论您需要多少个奇怪的运算符,到最小确定性自动机的转换始终是相同的,并且通过使用非确定性方法使每个运算符实现保持简单。
My advice = take a look at the manual for Adrian Thurstons Ragel.
There are simple, ways to generate a DFA directly, but I believe they only support a limited range of operators - basically the EBNF usual suspects. Ragel uses non-deterministic methods to compose complex automata from simpler ones, then uses epsilon elimination and minimisation to create efficient deterministic automata. No matter how many wierd operators you need, the conversion to a minimal deterministic automata is always the same, and each operator implementation is kept simple by using nondeterministic methods.
Cayuga 在底层利用非确定性有限状态机进行复杂事件处理。 嗯,看起来他们称之为“用于事件监控的状态发布/订阅”,但我相信它是 CEP。
我相信他们的一些论文甚至讨论了他们为什么使用自动机模型。 您可能想浏览一下他们的网站。
Cayuga utilizes non-deterministic finite state machines under the hood for complex event processing. Well, it looks like they call it "Stateful Publish/Subscribe for Event Monitoring", but I believe it is CEP.
I believe some of their papers even discuss why they are using an automata model. You might want to poke around their site.
维特比算法运行于隐藏马尔可夫模型,将它们视为 NFA。 不完全相同,但肯定相似。
它们在语音和文本识别等应用中很有用。
The viterbi algorithm operates on Hidden Markov Models by treating them much like an NFA. Not entirely identical, but certainly analogous.
They're useful in applications like speech and text recognition.
通常,创建 NFA 然后使用它要容易得多(唯一的区别是您持有一组状态而不是一个状态)。 如果你想快速完成,你可以制作 DFA,但不要忘记,完成它的时间是指数级的(因为生成的自动机可能呈指数级增长!)。
另一方面,如果你想制作一种补语语言,你别无选择,你需要一个 det。 变体。
这就是为什么否定不在正则表达式引擎中,而仅在类 ([^...]) 中的原因,在类 ([^...]) 中,您可以确定自动机是确定性的。
Very often it is much more easier to create a NFA and then work with it (the only difference is that you hold a set of states instead of one state). If you want to have it fast, you can make DFA, but don't forget, that the time to do it is exponential (because of the resultant automaton can be exponentially bigger!).
On the other hand, if you want to make a complement language, you have no choice, you need a det. variant.
It is the reason why the negation is in none of regular-expression-engine, only in classes ([^...]), where you can be sure that the automaton is deterministic.
如果我错了,请纠正我,但从我的编译器课程中,我记得有时您根本无法使用 DFA,因为它会导致状态“爆炸”。
Correct me if I'm wrong but from my compilers class I remember that sometimes you simply can't use DFA as it would lead to an "explosion" of states.
我认为选择非确定性有限自动机的主要原因是要真正恢复所选的匹配。 使用确定性版本可能要困难得多。
如果您只想知道它们是否匹配,而没有其他细节,我认为编译为有限自动机会更好。
I think the main reason for choosing a non-deterministic finite automaton would be to actually get the chosen match back. It's likely a lot harder to do it with a deterministic version.
If all you want to know is IF they match or not, and no other details, I would think compiling down to a finite automaton would be better.