改进改进的先序树遍历算法的可扩展性

发布于 2024-07-25 08:48:27 字数 1348 浏览 9 评论 0原文

我一直在考虑用于存储的修改后的先序树遍历算法平面表中的树（例如 SQL）。

我不喜欢标准方法的一个特性是插入一个节点必须（平均）接触 N/2 个节点（左侧或右侧高于插入点的所有节点）。

我见过的实现依赖于按顺序编号的值。这没有留下更新的空间。

这似乎不利于并发和扩展。想象一下，您有一个植根于世界的树，其中包含大型系统中每个帐户的用户组，它非常大，以至于您必须将树的子集存储在不同的服务器上。触摸所有节点的一半来将节点添加到树的底部是不好的。

这是我正在考虑的想法。基本上通过对键空间进行分区并在每个级别进行划分来为插入留出空间。

这是一个 N_max = 64 的示例（这通常是数据库的 MAX_INT）。

                     0:64
              ________|________
             /                 \
         1:31                   32:63
        /    \                 /     \
    2:14    15-30         33:47       48:62

这里，一个节点被添加到树的左半部分。

                     0:64  
              ________|________
             /                 \
         1:31                  32:63
      /   |   \               /     \
  2:11  11:20  21:30     33:47       48:62

必须扩展插入和删除过程的算法，以递归地重新编号子树的左/右索引。由于查询节点的直接子节点很复杂，因此我认为将父节点 id 也存储在表中是有意义的。然后，该算法可以选择子树（使用 left > p.left && right < p.right），然后使用 node.id 和 node.parent 来处理列表，细分索引。

这比仅仅增加所有索引为插入腾出空间（或减少删除）要复杂得多，但它有可能影响更少的节点（仅是插入/删除节点的父节点的后代）。

我的问题基本上是：

这个想法是否已经正式化或实施？
这与嵌套间隔相同吗？

原文

I've been thinking about the modified preorder tree traversal algorithm for storing trees within a flat table (such as SQL).

One property I dislike about the standard approach is that to insert a node you
have to touch (on average) N/2 of the nodes (everything with left or right higher than the insert point).

The implementations I've seen rely on sequentially numbered values. This leaves no room for updates.

This seems bad for concurrency and scaling. Imagine you have a tree rooted at the world containing user groups for every account in a large system, it's extremely large, to the point you must store subsets of the tree on different servers. Touching half of all the nodes to add a node to the bottom of the tree is bad.

Here is the idea I was considering. Basically leave room for inserts by partitioning the keyspace and dividing at each level.

Here's an example with N_max = 64 (this would normally be the MAX_INT of your DB)

                     0:64
              ________|________
             /                 \
         1:31                   32:63
        /    \                 /     \
    2:14    15-30         33:47       48:62

Here, a node is added to the left half of the tree.

                     0:64  
              ________|________
             /                 \
         1:31                  32:63
      /   |   \               /     \
  2:11  11:20  21:30     33:47       48:62

The alogorithm must be extended for the insert and removal process to recursively renumber to the left/right indexes for the subtree. Since querying for immediate children of a node is complicated, I think it makes sense to also store the parent id in the table. The algorithm can then select the sub tree (using left > p.left && right < p.right), then use node.id and node.parent to work through the list, subdividing the indexes.

This is more complex than just incrementing all the indexes to make room for the insert (or decrementing for removal), but it has the potential to affect far fewer nodes (only decendenants of the parent of the inserted/removed node).

My question(s) are basically: