改进改进的先序树遍历算法的可扩展性

发布于 2024-07-25 08:48:27 字数 1348 浏览 9 评论 0原文

我一直在考虑用于存储的修改后的先序树遍历算法平面表中的树(例如 SQL)。

我不喜欢标准方法的一个特性是插入一个节点 必须(平均)接触 N/2 个节点(左侧或右侧高于插入点的所有节点)。

我见过的实现依赖于按顺序编号的值。 这没有留下更新的空间。

这似乎不利于并发和扩展。 想象一下,您有一个植根于世界的树,其中包含大型系统中每个帐户的用户组,它非常大,以至于您必须将树的子集存储在不同的服务器上。 触摸所有节点的一半来将节点添加到树的底部是不好的。

这是我正在考虑的想法。 基本上通过对键空间进行分区并在每个级别进行划分来为插入留出空间。

这是一个 Nmax = 64 的示例(这通常是数据库的 MAX_INT)。

                     0:64
              ________|________
             /                 \
         1:31                   32:63
        /    \                 /     \
    2:14    15-30         33:47       48:62

这里,一个节点被添加到树的左半部分。

                     0:64  
              ________|________
             /                 \
         1:31                  32:63
      /   |   \               /     \
  2:11  11:20  21:30     33:47       48:62

必须扩展插入和删除过程的算法,以递归地重新编号子树的左/右索引。 由于查询节点的直接子节点很复杂,因此我认为将父节点 id 也存储在表中是有意义的。 然后,该算法可以选择子树(使用 left > p.left && right < p.right),然后使用 node.id 和 node.parent 来处理列表,细分索引。

这比仅仅增加所有索引为插入腾出空间(或减少删除)要复杂得多,但它有可能影响更少的节点(仅是插入/删除节点的父节点的后代)。

我的问题基本上是:

  1. 这个想法是否已经正式化或实施?

  2. 这与嵌套间隔相同吗?

I've been thinking about the modified preorder tree traversal algorithm for storing trees within a flat table (such as SQL).

One property I dislike about the standard approach is that to insert a node you
have to touch (on average) N/2 of the nodes (everything with left or right higher than the insert point).

The implementations I've seen rely on sequentially numbered values. This leaves no room for updates.

This seems bad for concurrency and scaling. Imagine you have a tree rooted at the world containing user groups for every account in a large system, it's extremely large, to the point you must store subsets of the tree on different servers. Touching half of all the nodes to add a node to the bottom of the tree is bad.

Here is the idea I was considering. Basically leave room for inserts by partitioning the keyspace and dividing at each level.

Here's an example with Nmax = 64 (this would normally be the MAX_INT of your DB)

                     0:64
              ________|________
             /                 \
         1:31                   32:63
        /    \                 /     \
    2:14    15-30         33:47       48:62

Here, a node is added to the left half of the tree.

                     0:64  
              ________|________
             /                 \
         1:31                  32:63
      /   |   \               /     \
  2:11  11:20  21:30     33:47       48:62

The alogorithm must be extended for the insert and removal process to recursively renumber to the left/right indexes for the subtree. Since querying for immediate children of a node is complicated, I think it makes sense to also store the parent id in the table. The algorithm can then select the sub tree (using left > p.left && right < p.right), then use node.id and node.parent to work through the list, subdividing the indexes.

This is more complex than just incrementing all the indexes to make room for the insert (or decrementing for removal), but it has the potential to affect far fewer nodes (only decendenants of the parent of the inserted/removed node).

My question(s) are basically:

  1. Has this idea been formalized or implemented?

  2. Is this the same as nested intervals?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

迷爱 2024-08-01 08:48:27

我以前听说过有人这样做,出于同样的原因,是的。

请注意,通过正常执行此操作,您确实会失去该算法的一些小优点

  • ,您可以通过 ((right - left + 1) div 2) 得知节点的后代数量。 这有时会很有用,例如,如果您要在树视图中显示计数,其中应包括在树中进一步找到的子节点的数量
  • 从上面开始,很容易选择所有叶节点 - WHERE(右) = 左 + 1)。

这些都是相当小的优点,可能对您没有用处,但对于某些使用模式来说它们显然很方便。

也就是说,正如上面所建议的,听起来物化路径可能对您更有用。

I have heard of people doing this before, for the same reasons, yes.

Note that you do lose at a couple of small advantages of the algorithm by doing this

  • normally, you can tell the number of descendants of a node by ((right - left + 1) div 2). This can occasionally be useful, if e.g. you'd displaying a count in a treeview which should include the number of children to be found further down in the tree
  • Flowing from the above, it's easy to select out all leaf nodes -- WHERE (right = left + 1).

These are fairly minor advantages and may not be useful to you anyway, though for some usage patterns they're obviously handy.

That said, it does sound like materialized paths may be more useful to you, as suggested above.

毁我热情 2024-08-01 08:48:27

我认为你最好考虑一种不同的存储树木的方法。 如果您的树很宽但不是很深(这对于您建议的情况似乎很可能),您可以将完整的祖先列表存储到每个节点的根。 这样,修改节点不需要接触除被修改的节点之外的任何节点。

I think you're better off looking at a different way of storing trees. If your tree is broad but not terribly deep (which seems likely for the case you suggested), you can store the complete list of ancestors up to the root against each node. That way, modifying a node doesn't require touching any nodes other than the node being modified.

葬﹪忆之殇 2024-08-01 08:48:27

您可以将表分成两个:第一个是(节点ID,节点值),第二个是(节点ID,子ID),它存储树的所有边。 然后插入和删除变成 O(树深度)(您必须导航到该元素并修复其下方的内容)。

您提出的解决方案看起来像B-tree。 如果您可以估计树中的节点总数,那么您可以预先选择树的深度。

You can split your table into two: the first is (node ID, node value), the second (node ID, child ID), which stores all the edges of the tree. Insertion and deletion then become O(tree depth) (you have to navigate to the element and fix what is below it).

The solution you propose looks like a B-tree. If you can estimate the total number of nodes in your tree, then you can choose the depth of the tree beforehand.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文