在树的节点上构建等价类的良好数据结构是什么?
我正在寻找一个好的数据结构来在树的节点上构建等价类。 在理想的结构中,以下操作应该快速(适当时为 O(1)/O(n))且简单(没有神秘代码段):
- (A) 从根开始遍历树; 在每个节点上 --> 子转换枚举子节点的所有等效版本
- (B) 合并两个等价类
- (C) 从现有节点(子节点)和其他数据的列表中创建新节点
- (D) 查找结构上与节点等效的任何节点(即它们)具有相同数量的子节点,相应的子节点属于相同的等价类,并且它们的“其他数据”相等),因此新的(或新修改的)节点可以放入正确的等价类中(通过合并)
到目前为止我已经考虑过(其中一些可以组合使用):
- 冻糕,其中子项是对节点集合而不是节点的引用。 (A) 速度快,(B) 需要遍历树并更新节点以指向合并的集合,(C) 需要找到包含新节点的每个子节点的集合,(D) 需要遍历树
- 维护节点的哈希值通过他们的特点。 这使得 (D) 更快,但 (B) 更慢(因为当合并等价类时必须更新哈希)
- 将节点串在一起形成循环链表。 (A) 很快,(B) 会很快,但事实上,循环列表的“合并”部分与其自身实际上会分割列表 (C) 会很快,(D) 需要遍历树
- 像上面一样,但每个节点中都有一个附加的“向上”指针,可用于查找循环列表的规范成员。
我错过了一个甜蜜的选择吗?
I'm looking for a good data structure to build equivalence classes on nodes of a tree. In an ideal structure, the following operations should be fast (O(1)/O(n) as appropriate) and easy (no paragraphs of mystery code):
- (A) Walk the tree from the root; on each node --> child transition enumerate all the equivalent versions of the child node
- (B) Merge two equivalence classes
- (C) Create new nodes from a list of existing nodes (the children) and other data
- (D) Find any nodes structurally equivalent to node (i.e. they have the same number of children, corresponding children belong to the same equivalence class, and their "other data" is equal) so that new (or newly modified) nodes may be put in the right equivalence class (via a merge)
So far I've considered (some of these could be used in combination):
- A parfait, where the children are references to collections of nodes instead of to nodes. (A) is fast, (B) requires walking the tree and updating nodes to point to the merged collection, (C) requires finding the collection containing each child of the new node, (D) requires walking the tree
- Maintaining a hash of nodes by their characteristics. This makes (D) much faster but (B) slower (since the hash would have to be updated when equivalence classes were merged)
- String the nodes together into a circular linked list. (A) is fast, (B) would be fast but for the fact that that "merging" part of a circular list with itself actually splits the list (C) would be fast, (D) would require walking the tree
- Like above, but with an additional "up" pointer in each node, which could be used to find a canonical member of the circular list.
Am I missing a sweet alternative?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您似乎有两种形式的等价需要处理。 简单等价 (A),跟踪为保持最新的等价类;以及结构等价 (D),您偶尔会构建一个等价类,然后将其丢弃。
在我看来,如果您维护简单等效和结构等效的等效类,那么问题在概念上会更简单。 如果这会给结构等价带来太多干扰,您可以为结构等价的某些方面维护等价类。 然后,您可以找到一个平衡点,您可以负担得起这些等价类的维护费用,但在构建结构等效节点列表时仍然大大减少要检查的节点数量。
You seem to have two forms of equivalence to deal with. Plain equivalence (A), tracked as equivalence classes which are kept up to date and structural equivalence (D), for which you occasionally go build a single equivalence class and then throw it away.
It sounds to me like the problem would be conceptually simpler if you maintain equivalence classes for both plain and structural equivalence. If that introduces too much churn for the structural equivalence, you could maintain equivalence classes for some aspects of structural equivalence. Then you could find a balance where you can afford the maintenance of those equivalence classes but still greatly reduce the number of nodes to examine when building a list of structurally equivalent nodes.
我不认为任何一种结构都能解决您的问题,但您可以看看 不相交集数据结构。 毕竟,等价类与集合的划分是一样的。 它应该能够快速处理其中一些操作。
I don't think any one structure is going to solve your problems, but you might take a look at the Disjoint-set data structure. An equivalence class, after all, is the same thing as a partitioning of a set. It should be able to handle some of those operations speedily.
退一步来说,我建议根本不要使用一棵树。 上次我不得不面对类似的问题,我从一棵树开始,但后来转向了一个数组。
原因有很多,但第一个原因是性能,我的类有多达 100 个左右的子级,实际上,将它们作为数组进行操作比通过树的节点操作会表现更好,主要是因为硬件局部性、CPU 预取逻辑和 CPU管道化。
因此,尽管在算法上,数组结构比树需要更多的操作,但执行这几十个操作可能比在内存中追逐指针更快。
Stepping back for a moment I'd suggest against using a tree at all. Last time I had to confront a similar problem, I began with a tree, but later moved onto an array.
Reasons being multiple but number one reason was performance, my classes with up to 100 or so children would actually perform better while manipulating them as array than through the the nodes of a tree, mostly because of hardware locality, and CPU prefetch logic, and CPU pipelining.
So although algorithmically an array structure requires a larger N of operations than a tree, performing these dozens of operations is likely faster than chasing pointers across memory.