存储有效的方法,用大字母存储DFA过渡表?

发布于 2025-01-21 14:09:46 字数 494 浏览 4 评论 0原文

我正在尝试为DFA实现一个非常简单的状态过渡表,但是我遇到了一些效率问题。

如果我以每行作为源状态排列一个过渡表,将每个列作为可能的输入,将相交作为下一个状态,则过渡很简单:

next_state = table[from_state, input]

此方法将给出O(1)的过渡时间。当有很多可能的输入时,就会出现问题。如果我有一个说所有128个ASCII字符的字母,那么该表将不受控制,并通过记忆咀嚼,甚至只有几个状态(状态 *输入)。

我考虑的另一个选项是使每个列成为目标状态,然后将交点为输入符号。

input = table[from_state, to_state]

这种方法减少了记忆消耗,但是执行过渡将需要测试所有列,直到以O(n)方式找到正确的输入为止。

我真的陷入了时间空间的权衡,还是有更好的实施方法?

PS我知道压缩算法存在,但是可以使用它们在构造过程中保持压缩吗?

I am trying to implement a very simple state transition table for a DFA, but I'm running into some efficiency problems.

If I arrange a transition table with each row as the source state, each column as a possible input, and the intersection as the next state, then transitioning is as simple as:

next_state = table[from_state, input]

This method will give O(1) transition time. The problem arises when there are lots of possible inputs. If I have an alphabet of say all 128 ASCII characters, then the table will grow out of control and chew through memory for even just a few states (states * inputs).

Another option I considered was to make each column the destination state, and the intersections the input symbol.

input = table[from_state, to_state]

This approach cuts down on memory consumption but performing a transition would require testing all of the columns until one is found with the correct input in O(N) fashion.

Am I really stuck in a time-space tradeoff, or is there some sort of better way to implement this?

P.S. I am aware that compression algorithms exist, but can they be used to keep the table compressed during it's construction?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

一个人练习一个人 2025-01-28 14:09:46

我认为我已经决定实施似乎适合我的需求。它使用与第一种方法相同的方法,但可以保留一个单独的元组,该元组跟踪最小,最大的使用Codepoints。该表将仅将其列仅扩展到此范围内的字符数。由于大多数字符可能永远不会使用,除了紧密的字符外,这可能就足够了。

索引然后从索引0开始就变为(输入-Lowest_codepoint)。

I think I've decided on an implementation that seems to fit my needs. It uses the same method as the first approach but keeps a separate tuple that tracks the smallest and largest used codepoints. The table will expand its columns only to the number of characters in this range. Since most characters probably won't ever be used except for a tight range of characters, this is probably sufficient.

Indexing then just becomes (input - lowest_codepoint) starting from index 0.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文