使用有限自动机作为容器的键

发布于 2024-08-13 14:49:01 字数 699 浏览 1 评论 0原文

我有一个问题,我确实需要能够使用有限自动机作为关联容器的键。每个键实际上应该代表一个等价的自动机,这样当我搜索时,我会找到一个等效的自动机(如果存在这样的键),即使该自动机在结构上并不相同。

一个明显的最后手段当然是使用线性搜索,并对每个检查的键进行等价测试。我希望可以做得比这更好。

我一直在考虑尝试施加任意但一致的排序,并导出有序的比较算法。第一原理涉及自动机表示的字符串集。评估每个自动机可能的第一个标记的集合,并根据这两个集合应用排序。如果有必要,继续处理可能的第二个标记、第三个标记等的集合。天真地这样做的明显问题是,在证明等价性之前需要检查无数个标记集。

我一直在考虑一些模糊的想法 - 首先最小化输入自动机并使用某种闭包算法,或者转换回常规语法,一些涉及生成树的想法。我得出的结论是,我需要放弃标记集的词汇顺序,但迄今为止我得出的最重要的结论是,这并不是微不足道的,我可能最好阅读别人的解决方案。

我从 CiteSeerX 下载了一篇论文 - 子组的总排序和陪集 - 但我的抽象代数还不够好,无法知道这是否相关。

我还想到可能有某种方法可以从自动机中导出哈希值,但我还没有考虑太多。

谁能推荐一篇值得阅读的好论文? - 或者至少让我知道我下载的那个是否是转移注意力的?

I have a problem where I really need to be able to use finite automata as the keys to an associative container. Each key should actually represent an equivalence class of automata, so that when I search, I will find an equivalent automaton (if such a key exists), even if that automaton isn't structurally identical.

An obvious last-resort approach is of course to use linear search with an equivalence test for each key checked. I'm hoping it's possible to do a lot better than this.

I've been thinking in terms of trying to impose an arbitrary but consistent ordering, and deriving an ordered comparison algorithm. First principles involve the sets of strings that the automata represent. Evaluate the set of possible first tokens for each automaton, and apply an ordering based on those two sets. If necessary, continue to the sets of possible second tokens, third tokens etc. The obvious problem with doing this naively is that there's an infinite number of token-sets to check before you can prove equivalence.

I've been considering a few vague ideas - minimising the input automata first and using some kind of closure algorithm, or converting back to a regular grammar, some ideas involving spanning trees. I've come to the conclusion that I need to abandon the set-of-tokens lexical ordering, but the most significant conclusion I've reached so far is that this isn't trivial, and I'm probably better off reading up on someone elses solution.

I've downloaded a paper from CiteSeerX - Total Ordering on Subgroups and Cosets - but my abstract algebra isn't even good enough to know if this is relevant yet.

It also occurred to me that there might be some way to derive a hash from an automaton, but I haven't given this much thought yet.

Can anyone suggest a good paper to read? - or at least let me know if the one I've downloaded is a red herring or not?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

厌倦 2024-08-20 14:49:01

我相信您可以从最小化自动机获得规范形式。对于任何两个等价的自动机,它们的最小化形式是同构的(我相信这是从 Myhill-Nerode 定理得出的)。这种同构尊重边标签,当然还有节点类(开始、接受、不接受)。这比未标记的图同构更容易。

我认为,如果您从开始状态开始构建最小化自动机的生成树,并按标签对输出边进行排序,那么您将获得自动机的规范形式,然后可以对其进行散列。

编辑:也应该考虑非树边缘,但它们也可以按标签规范排序。

I believe that you can obtain a canonical form from minimized automata. For any two equivalent automatons, their minimized forms are isomorphic (I believe this follows from Myhill-Nerode theorem). This isomorphism respects edge labels and of course node classes (start, accepting, non-accepting). This makes it easier than unlabeled graph isomorphism.

I think that if you build a spanning tree of the minimized automaton starting from the start state and ordering output edges by their labels, then you'll get a canonical form for the automaton which can then be hashed.

Edit: Non-tree edges should be taken into account too, but they can also be ordered canonically by their labels.

深居我梦 2024-08-20 14:49:01

这是 1992 年的论文表格,其中他们生成了规范的最小化自动机:非确定性有限自动机的最小化一旦

你有了规范,你就可以轻松地对它进行散列,例如通过对状态和转换执行深度优先枚举,并对通过编码状态数获得的字符串进行散列(按照它们第一次出现的顺序对它们进行计数)对于状态和转换作为三元组

<from_state, symbol, to_state, is_accepting_final_state>

这应该可以解决问题。

here is a thesis form 1992 where they produce canonical minimized automata: Minimization of Nondeterministic Finite Automata

Once you have the canonical, form you can easily hash it for example by performing a depth first enumeration of the states and transitions, and hashing a string obtained by encoding state numbers (count them in the order of their first appearance) for states and transitions as triples

<from_state, symbol, to_state, is_accepting_final_state>

This should solve the problem.

逆流 2024-08-20 14:49:01

当问题似乎无法克服时,解决方案通常是公开宣布您认为问题有多困难。然后,你会立即意识到问题是微不足道的,你只是让自己看起来像个白痴 - 这基本上就是我现在的处境;-)

正如问题中所建议的,为了在词汇上对两个自动机进行排序,我需要考虑两件事。两组可能的第一个标记,以及两组可能的一切其他尾部。尾部可以表示为有限自动机,并且可以从原始自动机导出。

因此,比较算法是递归的 - 比较头部,如果不同,则得到结果,如果相同,则递归比较尾部。

问题是证明一般正则语法的等价性所需的无限序列。如果在比较过程中出现一对自动机,相当于您之前检查过的一对自动机,则您已经证明了等价性,并且可以停止检查。根据有限自动机的本质,这必须在有限数量的步骤中发生。

问题是我仍然有同样形式的问题。为了确定我的终止标准,我需要将我的当前自动机对与到目前为止在比较过程中发生的所有过去的自动机对进行比较。这就是一直让我头疼的事情。

事实证明,那篇论文是相关的,但可能只带我到目前为止。常规语言可以使用连接运算符形成一个组,左陪集与我一直在考虑的 head:tail 相关。

我之所以是个白痴,是因为我施加了一个过于严格的终止条件,而且我应该知道这一点,因为这对于 WRT 自动机算法来说并不是一个不寻常的问题。

我不需要在自动机对第一次出现时停止。我可以继续下去,直到找到一个更容易检测到的重复——具有某种结构等价性和逻辑等价性的重复。只要我的导出尾部自动机算法是健全的(特别是如果我在每个步骤中最小化并进行其他清理),我就不会在比较过程中生成无限序列的等效但外观不同的自动机对。结构变化的唯一来源是原始的两个自动机和尾部自动机算法,两者都是有限的。

关键是,如果我比较太多的词汇术语,那并不重要——我仍然会得到正确的结果,虽然我会稍后终止,但我仍然会在有限的时间内终止。

这应该意味着我可以使用对自动机结构敏感的散列或有序比较来使用不可靠的重复检测(允许一些误报)。这是一个比结构不敏感的比较更简单的问题,我认为这是我需要的关键。

当然,仍然存在性能问题。根据此处涉及的问题,使用标准等价算法的线性搜索可能是一种更快的方法。当然,我希望这种比较比现有算法效率更低的等价测试,因为它做了更多的工作——非等价情况的词汇排序。真正的问题是基于键的搜索的整体效率,这可能需要一些令人头疼的分析。我希望非等价自动机倾向于快速比较(在前几个步骤中检测差异,就像传统的字符串比较)这一事实将使这成为一种实用的方法。

另外,如果我怀疑是否存在等价性,我可以使用标准等价算法来检查。如果该检查失败,我只需继续比较我停止的顺序,而不需要检查重复出现的尾部语言 - 我知道我会在有限数量的步骤中发现差异。

When a problem seems insurmountable, the solution is often to publicly announce how difficult you think the problem is. Then, you will immediately realise that the problem is trivial and that you've just made yourself look an idiot - and that's basically where I am now ;-)

As suggested in the question, to lexically order the two automata, I need to consider two things. The two sets of possible first tokens, and the two sets of possible everything-else tails. The tails can be represented as finite automata, and can be derived from the original automata.

So the comparison algorithm is recursive - compare the head, if different you have your result, if the same then recursively compare the tail.

The problem is the infinite sequence needed to prove equivalence for regular grammars in general. If, during a comparison, a pair of automata recur, equivalent to a pair that you checked previously, you have proven equivalence and you can stop checking. It is in the nature of finite automata that this must happen in a finite number of steps.

The problem is that I still have a problem in the same form. To spot my termination criteria, I need to compare my pair of current automata with all the past automata pairs that occurred during the comparison so far. That's what has been giving me a headache.

It also turns out that that paper is relevant, but probably only takes me this far. Regular languages can form a group using the concatenation operator, and the left coset is related to the head:tail things I've been considering.

The reason I'm an idiot is because I've been imposing a far too strict termination condition, and I should have known it, because it's not that unusual an issue WRT automata algorithms.

I don't need to stop at the first recurrence of an automata pair. I can continue until I find a more easily detected recurrence - one that has some structural equivalence as well as logical equivalence. So long as my derive-a-tail-automaton algorithm is sane (and especially if I minimise and do other cleanups at each step) I will not generate an infinite sequence of equivalent-but-different-looking automata pairs during the comparison. The only sources of variation in structure are the original two automata and the tail automaton algorithm, both of which are finite.

The point is that it doesn't matter that much if I compare too many lexical terms - I will still get the correct result, and while I will terminate a little later, I will still terminate in finite time.

This should mean that I can use an unreliable recurrence detection (allowing some false negatives) using a hash or ordered comparison that is sensitive to the structure of the automata. That's a simpler problem than the structure-insensitive comparison, and I think it's the key that I need.

Of course there's still the issue of performance. A linear search using a standard equivalence algorithm might be a faster approach, based on the issues involved here. Certainly I would expect this comparison to be a less efficient equivalence test than existing algorithms, as it is doing more work - lexical ordering of the non-equivalent cases. The real issue is the overall efficiency of a key-based search, and that is likely to need some headache-inducing analysis. I'm hoping that the fact that non-equivalent automata will tend to compare quickly (detecting a difference in the first few steps, like traditional string comparisons) will make this a practical approach.

Also, if I reach a point where I suspect equivalence, I could use a standard equivalence algorithm to check. If that check fails, I just continue comparing for the ordering where I left off, without needing to check for the tail language recurring - I know that I will find a difference in a finite number of steps.

信愁 2024-08-20 14:49:01

如果您所能做的就是 == 或 !=,那么我认为您必须在添加另一个成员之前检查每个集合成员。这很慢。 (编辑:鉴于问题的标题,我想您已经知道这一点,即使您继续使用比较函数来直接比较两个有限自动机。)

我尝试使用系统发育树来做到这一点,但它很快就遇到了性能问题。如果您想构建没有重复的大型集合,则需要一种转换为规范形式的方法。然后您可以检查哈希值,或将字符串表示形式作为键插入二叉树中。

另一位研究人员确实想出了一种将树转换为规范代表的方法,他们使用 Patricia 树来存储唯一的树以进行重复检查。

If all you can do is == or !=, then I think you have to check every set member before adding another one. This is slow. (Edit: I guess you already know this, given the title of your question, even though you go on about comparison functions to directly compare two finite automata.)

I tried to do that with phylogenetic trees, and it quickly runs into performance problems. If you want to build large sets without duplicates, you need a way to transform to a canonical form. Then you can check a hash, or insert into a binary tree with the string representation as a key.

Another researcher who did come up with a way to transform a tree to a canonical rep used Patricia trees to store unique trees for duplicate-checking.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文