在 SQL 中管理层次结构:MPTT/嵌套集 vs 邻接表 vs 存储路径

发布于 2024-12-17 04:12:11 字数 1144 浏览 0 评论 0原文

一段时间以来,我一直在思考如何最好地处理 SQL 中的层次结构。由于对邻接列表的限制和 MPTT/嵌套集的复杂性感到沮丧,我开始考虑简单地将关键路径存储为简单的 node_key/node_key/... 字符串。我决定编译这三种技术的优缺点:

创建/删除/移动节点所需的调用次数:

  • 邻接 = 1
  • MPTT = 3
  • 路径 = 1 (将包含该节点的所有节点的旧节点路径替换为新节点路径) path)

获取树所需的调用次数:

  • Adjacency = [子级别数]
  • MPTT = 1
  • Path = 1

获取节点/祖先的路径所需的调用次数:

  • Adjacency = [number上级数]
  • MPTT = 1
  • Path = 0

获取子节点数所需的调用次数:

  • Adjacency = [子级数]
  • MPTT = 0 (可以根据右/左值计算)
  • Path = 1

所需的调用数获取节点深度:

  • Adjacency = [超级级别数]
  • MPTT = 1
  • Path = 0

所需的数据库字段:

  • Adjacency = 1(父)
  • MPTT = 3(父、右、左)
  • Path = 1(路径)

结论

在除一个用例之外的每个用例中,存储路径技术都使用与其他技术相同或更少的调用。通过此分析,存储路径显然是赢家。更不用说,它实现起来更简单,人类可读等等。

所以问题是,存储路径不应该被认为是比 MPTT 更强的技术吗?为什么存储路径不是更常用的技术,以及为什么在给定实例中不通过 MPTT 使用它们?

另外,如果您认为此分析不完整,请告诉我。

更新:

以下是 MPTT 至少可以开箱即用执行存储路径解决方案无法执行的两件事:

  1. 允许计算每个节点的子节点计数,而无需任何其他查询(如上所述)。
  2. 对给定级别的节点施加命令。其他解决方案是无序的。

For a while now I've been wrestling with how best to handle hierarchies in SQL. Frustrated by the limitations of adjacency lists and the complexity of MPTT/nested sets, I began thinking about simply storing key paths instead, as a simple node_key/node_key/... string. I decided to compile the pros and cons of the three techniques:

Number of calls required to create/delete/move a node:

  • Adjacency = 1
  • MPTT = 3
  • Path = 1 (Replace old node path with new node path across all nodes that contain that path)

Number of calls required to get a tree:

  • Adjacency = [number of sub-levels]
  • MPTT = 1
  • Path = 1

Number of calls required to get path to a node / ancestry:

  • Adjacency = [number of super-levels]
  • MPTT = 1
  • Path = 0

Number of calls required to get number of subnodes:

  • Adjacency = [number of sub-levels]
  • MPTT = 0 (Can be calculated from right/left values)
  • Path = 1

Number of calls required to get depth of node:

  • Adjacency = [number of super-levels]
  • MPTT = 1
  • Path = 0

DB fields required:

  • Adjacency = 1 (parent)
  • MPTT = 3 (parent,right,left)
  • Path = 1 (path)

Conclusion

The stored path technique uses the same or less calls than the other techniques in every use case except one. By this analysis, storing paths is a clear winner. Not to mention, it's a lot simpler to implement, human readable, etc.

So the question is, shouldn't stored paths be considered a stronger technique than MPTT? Why are stored paths not a more commonly used technique, and why would you not use them over MPTT in a given instance?

Also, if you think this analysis is incomplete please let me know.

UPDATE:

Here are at least 2 things MPTT can do out of the box that a stored path solution won't:

  1. Allows calculation of subnode count for each node without any additional queries (mentioned above).
  2. Imposes an order on nodes at a given level. The other solutions are unordered.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

离去的眼神 2024-12-24 04:12:11

您还可以考虑我在对 将平面表解析为树的最有效/优雅的方法是什么?

需要调用创建/删除/移动节点:

  • Closure = 1

获取树所需的调用:

  • Closure = 1

获取节点/祖先的路径所需的调用:

  • Closure = 1

获取子节点数量所需的调用:

  • Closure = 1

获取所需的调用节点深度:

  • 闭合 = 1

所需的数据库字段:

  • 邻接 = 1 个以上字段/行
  • 路径 = 1 个以上字段/行
  • MPTT = 2 个或 3 个以上字段/行
  • 闭合 = 2或额外表中的 3 个字段。该表在最坏情况下有 O(n^2) 行,但比大多数实际情况要少得多。

还有其他一些考虑因素:

支持无限深度:

  • 邻接 = yes
  • MPTT = yes
  • 路径 = no
  • Closure = yes

支持引用完整性:

  • 邻接 = yes
  • MPTT = no
  • Path = no
  • Closure = yes

我还在演示中介绍了 Closure Table 使用 SQL 和的分层数据模型PHP,以及我的书 SQL 反模式卷 1:避免数据库编程的陷阱

You might also consider the Closure Table design I describe in my answer to What is the most efficient/elegant way to parse a flat table into a tree?

Calls required to create/delete/move a node:

  • Closure = 1

Calls required to get a tree:

  • Closure = 1

Calls required to get path to a node / ancestry:

  • Closure = 1

Calls required to get number of subnodes:

  • Closure = 1

Calls required to get depth of node:

  • Closure = 1

DB fields required:

  • Adjancency = 1 more field / row
  • Path = 1 more field / row
  • MPTT = 2 or 3 more fields / row
  • Closure = 2 or 3 fields in extra table. This table has O(n^2) rows worst case but far fewer than that in most practical cases.

There are a couple of other considerations:

Supports unlimited depth:

  • Adjacency = yes
  • MPTT = yes
  • Path = no
  • Closure = yes

Supports referential integrity:

  • Adjacency = yes
  • MPTT = no
  • Path = no
  • Closure = yes

I also cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP, and my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

离鸿 2024-12-24 04:12:11

您的结论的问题在于它忽略了与树木相关的大多数问题。

通过将技术的有效性降低到“调用次数”,您实际上忽略了众所周知的数据结构和算法试图解决的所有问题;也就是说,最快的执行速度以及较低的内存和资源占用量。

对 SQL 服务器的“调用次数”可能看起来是一个很好的度量标准(“看起来代码更少”),但是如果结果是一个永远不会完成、运行缓慢或占用太多空间的程序,那么它就是实际上是一个无用的指标。

通过存储每个节点的路径,您并没有创建树数据结构。相反,您正在创建一个列表。树旨在优化的任何操作都会丢失。

对于小日期集,这可能很难看到(在许多小树的情况下,列表更好),在大小为 500、1000、10k 的数据集上尝试一些示例——您很快就会明白为什么不存储整个路径好主意。

It problem with your conclusion is that it ignores most of the issues involved in working with trees.

By reducing the validity of a technique to the "number of calls" you effectively ignore all of the issues which well understood data structures and algorithms attempt to solve; that is, fastest execution and low memory and resource foot print.

The "number of calls" to an SQL server may seem like a good metric to use ("look ma less code"), but if the result is a program which never finishes, runs slowly, or takes up to much space, it is in fact a useless metric.

By storing the path with every node you are not creating a tree data structure. Instead you are creating a list. Any operation which a tree is designed to optimize is lost.

This might be hard to see with small date sets (and in many cases of small trees a list is better), try some examples on data sets of size 500, 1000, 10k -- You will quickly see why storing the whole path is not a good idea.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文