当前位置：文江博客话题详情

在关系数据库中存储分层数据有哪些选项？

发布于 2024-09-30 01:40:19 字数 4162 浏览 10 评论 0原文

良好的概述

一般来说，您要在快速读取时间（例如嵌套集）或快速写入时间（邻接列表）之间做出决定。通常，您最终会得到最适合您需求的以下选项组合。以下提供了一些深入阅读：

再一个嵌套间隔与邻接列表比较：我发现的邻接列表、物化路径、嵌套集和嵌套间隔的最佳比较。
分层数据模型：幻灯片，对权衡和示例用法进行了很好的解释
< a href="http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/" rel="noreferrer">在 MySQL 中表示层次结构：非常好的嵌套集概述，特别是
RDBMS 中的分层数据：我见过的最全面且组织良好的链接集见过，但没有太多解释

选项

我知道的选项和一般功能：

邻接列表：

列：ID、ParentID
易于实现。
便宜的节点移动、插入和删除。
寻找等级、血统和等级的成本很高。后代，路径
在支持它们的数据库中通过通用表表达式避免 N+1

嵌套集（又名修改后的预序树遍历 (MPTT))

列：左、右
便宜的祖先，后代
非常昂贵 O(n/2) 由于易失性编码而移动、插入、删除

桥接表（又名闭包表 /w 触发器）

使用带有祖先、后代、深度（可选）的单独联接表
便宜的祖先和后代
写入成本 O(log n) （子树的大小）用于插入、更新、删除
规范化编码：有利于 RDBMS 统计和删除连接中的查询规划器
每个节点需要多行

沿袭列（又名物化路径，路径枚举）

列：谱系（例如 /parent/child/grandchild/etc...）
通过前缀查询廉价后代（例如 LEFT(lineage, #) = '/enumerated/path'）
写入成本 O(log n)（子树的大小）用于插入、更新、删除
非关系：依赖于数组数据类型或序列化字符串格式

嵌套间隔

与嵌套集类似，但使用实数/浮点/小数，以便编码不是易失性的（廉价的移动/插入/删除
）实数/浮点/十进制表示/精度问题
矩阵编码变体添加祖先编码（物化路径）“免费”，但增加了线性代数的复杂性。

平面表

修改后的邻接列表，为每个记录添加级别和等级（例如排序）列。
迭代/分页成本低
移动和删除成本
高良好用途：线程讨论 - 论坛/博客评论

多个谱系列

列：一列对于每个谱系级别，指的是直到根为止的所有父级，从项目级别向下的级别设置为 NULL
便宜的祖先、后代、级别
便宜的叶子插入、删除、移动
昂贵的内部节点的插入、删除、移动
层次结构深度的硬性限制

数据库特定说明

MySQL/MariaDB

~~使用邻接列表的会话变量~~
在 MySQL 8.0 或 MariaDB 10.2 中使用 CTE

Oracle

使用 CONNECT BY 遍历邻接列表

PostgreSQL

ltree 数据类型用于物化路径

SQL Server

总体摘要
2008 年优惠HierarchyId 数据类型似乎有助于沿袭列方法并扩展可表示的深度。

原文

Good Overviews

Generally speaking, you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:

One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set, and Nested Interval I've found.
Models for hierarchical data: slides with good explanations of tradeoffs and example usage
Representing hierarchies in MySQL: very good overview of Nested Set in particular
Hierarchical data in RDBMSs: a most comprehensive and well-organized set of links I've seen, but not much in the way of explanation

Options

Ones I am aware of and general features:

Adjacency List:

Columns: ID, ParentID
Easy to implement.
Cheap node moves, inserts, and deletes.
Expensive to find the level, ancestry & descendants, path
Avoid N+1 via Common Table Expressions in databases that support them

Nested Set (a.k.a Modified Preorder Tree Traversal (MPTT))

Columns: Left, Right
Cheap ancestry, descendants
Very expensive O(n/2) moves, inserts, deletes due to volatile encoding

Bridge Table (a.k.a. Closure Table /w triggers)

Uses separate join table with ancestor, descendant, depth (optional)
Cheap ancestry and descendants
Writes costs O(log n) (size of the subtree) for insert, updates, deletes
Normalized encoding: good for RDBMS statistics & query planner in joins
Requires multiple rows per node

Lineage Column (a.k.a. Materialized Path, Path Enumeration)

Column: lineage (e.g. /parent/child/grandchild/etc...)
Cheap descendants via prefix query (e.g. LEFT(lineage, #) = '/enumerated/path')
Writes costs O(log n) (size of the subtree) for insert, updates, deletes
Non-relational: relies on Array datatype or serialized string format

Nested Intervals

Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)
Has real/float/decimal representation/precision issues
Matrix encoding variant adds ancestor encoding (materialized path) for "free", but with the added trickiness of linear algebra.

Flat Table

A modified Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
Cheap to iterate/paginate over
Expensive move and delete
Good Use: threaded discussion - forums / blog comments

Multiple lineage columns

Columns: one for each lineage level, refers to all the parents up to the root, levels down from the item's level are set to NULL
Cheap ancestors, descendants, level
Cheap insert, delete, move of the leaves
Expensive insert, delete, move of the internal nodes
Hard limit to how deep the hierarchy can be

Database Specific Notes

MySQL/MariaDB

~~Use session variables for Adjacency List~~
Use CTEs in MySQL 8.0 or MariaDB 10.2

Oracle

Use CONNECT BY to traverse Adjacency Lists

PostgreSQL

ltree datatype for Materialized Path

SQL Server

General summary
2008 offers HierarchyId data type that appears to help with the Lineage Column approach and expand the depth that can be represented.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心房的律动 2024-10-07 01:40:19

我最喜欢的答案是本主题第一句话所建议的。使用邻接列表来维护层次结构并使用嵌套集来查询层次结构。

到目前为止的问题是，从邻接列表到嵌套集的转换方法非常慢，因为大多数人使用称为“推栈”的极端 RBAR 方法来进行转换，并且被认为是一种昂贵的方法通过邻接列表实现维护的简单性和嵌套集的出色性能。结果，大多数人最终不得不选择其中一个，特别是当节点数量超过 100,000 个左右时。使用推栈方法可能需要一整天的时间才能对传销者认为的小型百万节点层次结构进行转换。

我想通过提出一种以看似不可能的速度将邻接表转换为嵌套集的方法来给 Celko 一些竞争。这是我的 i5 笔记本电脑上推栈方法的性能。

Duration for     1,000 Nodes = 00:00:00:870 
Duration for    10,000 Nodes = 00:01:01:783 (70 times slower instead of just 10)
Duration for   100,000 Nodes = 00:49:59:730 (3,446 times slower instead of just 100) 
Duration for 1,000,000 Nodes = 'Didn't even try this'

这是新方法的持续时间（括号中是推栈方法）。

Duration for     1,000 Nodes = 00:00:00:053 (compared to 00:00:00:870)
Duration for    10,000 Nodes = 00:00:00:323 (compared to 00:01:01:783)
Duration for   100,000 Nodes = 00:00:03:867 (compared to 00:49:59:730)
Duration for 1,000,000 Nodes = 00:00:54:283 (compared to something like 2 days!!!)

是的，这是正确的。 100 万个节点在一分钟内完成转换，100,000 个节点在 4 秒内完成转换。

您可以通过以下 URL 了解新方法并获取代码副本。
http://www.sqlservercentral.com/articles/Hierarchy/94040/

我还开发了一个“预使用类似的方法聚合”层次结构。传销者和制作物料清单的人会对本文特别感兴趣。
http://www.sqlservercentral.com/articles/T-SQL/94570/

如果您这样做停下来看看这两篇文章，跳到“加入讨论”链接，让我知道您的想法。

My favorite answer is as what the first sentence in this thread suggested. Use an Adjacency List to maintain the hierarchy and use Nested Sets to query the hierarchy.

The problem up until now has been that the coversion method from an Adjacecy List to Nested Sets has been frightfully slow because most people use the extreme RBAR method known as a "Push Stack" to do the conversion and has been considered to be way to expensive to reach the Nirvana of the simplicity of maintenance by the Adjacency List and the awesome performance of Nested Sets. As a result, most people end up having to settle for one or the other especially if there are more than, say, a lousy 100,000 nodes or so. Using the push stack method can take a whole day to do the conversion on what MLM'ers would consider to be a small million node hierarchy.

I thought I'd give Celko a bit of competition by coming up with a method to convert an Adjacency List to Nested sets at speeds that just seem impossible. Here's the performance of the push stack method on my i5 laptop.

Duration for     1,000 Nodes = 00:00:00:870 
Duration for    10,000 Nodes = 00:01:01:783 (70 times slower instead of just 10)
Duration for   100,000 Nodes = 00:49:59:730 (3,446 times slower instead of just 100) 
Duration for 1,000,000 Nodes = 'Didn't even try this'

And here's the duration for the new method (with the push stack method in parenthesis).

Duration for     1,000 Nodes = 00:00:00:053 (compared to 00:00:00:870)
Duration for    10,000 Nodes = 00:00:00:323 (compared to 00:01:01:783)
Duration for   100,000 Nodes = 00:00:03:867 (compared to 00:49:59:730)
Duration for 1,000,000 Nodes = 00:00:54:283 (compared to something like 2 days!!!)

Yes, that's correct. 1 million nodes converted in less than a minute and 100,000 nodes in under 4 seconds.

You can read about the new method and get a copy of the code at the following URL.
http://www.sqlservercentral.com/articles/Hierarchy/94040/

I also developed a "pre-aggregated" hierarchy using similar methods. MLM'ers and people making bills of materials will be particularly interested in this article.
http://www.sqlservercentral.com/articles/T-SQL/94570/

If you do stop by to take a look at either article, jump into the "Join the discussion" link and let me know what you think.

回复收藏 0 原文

魔 2024-10-07 01:40:19

邻接模型 + 嵌套集模型

我选择它是因为我可以轻松地将新项目插入到树中（您只需要一个分支的 id 即可向其中插入新项目）并且查询速度也相当快。

+-------------+----------------------+--------+-----+-----+
| category_id | name                 | parent | lft | rgt |
+-------------+----------------------+--------+-----+-----+
|           1 | ELECTRONICS          |   NULL |   1 |  20 |
|           2 | TELEVISIONS          |      1 |   2 |   9 |
|           3 | TUBE                 |      2 |   3 |   4 |
|           4 | LCD                  |      2 |   5 |   6 |
|           5 | PLASMA               |      2 |   7 |   8 |
|           6 | PORTABLE ELECTRONICS |      1 |  10 |  19 |
|           7 | MP3 PLAYERS          |      6 |  11 |  14 |
|           8 | FLASH                |      7 |  12 |  13 |
|           9 | CD PLAYERS           |      6 |  15 |  16 |
|          10 | 2 WAY RADIOS         |      6 |  17 |  18 |
+-------------+----------------------+--------+-----+-----+

每次您需要任何父级的所有子级时，只需查询 parent 列即可。
如果您需要任何父级的所有后代，您可以查询其 lft 位于父级的 lft 和 rgt 之间的项目。
如果您需要任何节点直到树根的所有父节点，您可以查询 lft 低于该节点的 lft 和 rgt 的项目大于节点的 rgt 并按 parent 排序。

我需要比插入更快地访问和查询树，这就是我选择这个的原因

唯一的问题是修复左和右列插入新项目时。好吧，我为它创建了一个存储过程，并在每次插入新项目时调用它，这在我的情况下很少见，但速度非常快。
我从 Joe Celko 的书中得到了这个想法，DBA SE 中解释了存储过程以及我是如何想出它的
https://dba.stackexchange.com/q/89051/41481

虽然此解决方案允许快速搜索来定位对于处理需要频繁插入或删除的大型数据集来说，由于其在这些操作中的性能较慢，它并不理想。因此，它最适合不经常更换的表。

Adjacency Model + Nested Sets Model

I went for it because I could insert new items to the tree easily (you just need a branch's id to insert a new item to it) and also query it quite fast.

+-------------+----------------------+--------+-----+-----+
| category_id | name                 | parent | lft | rgt |
+-------------+----------------------+--------+-----+-----+
|           1 | ELECTRONICS          |   NULL |   1 |  20 |
|           2 | TELEVISIONS          |      1 |   2 |   9 |
|           3 | TUBE                 |      2 |   3 |   4 |
|           4 | LCD                  |      2 |   5 |   6 |
|           5 | PLASMA               |      2 |   7 |   8 |
|           6 | PORTABLE ELECTRONICS |      1 |  10 |  19 |
|           7 | MP3 PLAYERS          |      6 |  11 |  14 |
|           8 | FLASH                |      7 |  12 |  13 |
|           9 | CD PLAYERS           |      6 |  15 |  16 |
|          10 | 2 WAY RADIOS         |      6 |  17 |  18 |
+-------------+----------------------+--------+-----+-----+

Every time you need all children of any parent you just query the parent column.
If you needed all descendants of any parent you query for items which have their lft between lft and rgt of parent.
If you needed all parents of any node up to the root of the tree, you query for items having lft lower than the node's lft and rgt bigger than the node's rgt and sort the by parent.

I needed to make accessing and querying the tree faster than inserts, that's why I chose this

The only problem is to fix the left and right columns when inserting new items. well I created a stored procedure for it and called it every time I inserted a new item which was rare in my case but it is really fast.
I got the idea from the Joe Celko's book, and the stored procedure and how I came up with it is explained here in DBA SE
https://dba.stackexchange.com/q/89051/41481

Although this solution allows for rapid searches to locate descendants, it is not ideal for handling large datasets that require frequent inserts or deletes due to its slow performance in these operations. Therefore, it is best suited for tables that won't chnage frequently.

回复收藏 0 原文

冰葑 2024-10-07 01:40:19

这个设计还没提到：

多血统列

虽然它有局限性，但如果你能忍受的话，它非常简单，非常高效的。特点：

列：每个谱系级别一个，指的是直到根的所有父级，当前项目级别以下的级别设置为 0（或 NULL）
层次结构的深度有一个固定的限制
便宜的祖先，后代，级别
叶子的廉价插入、删除、移动
内部节点的昂贵插入、删除、移动

下面是一个示例 - 鸟类分类树，因此层次结构为类/目/科/属/种 - 物种是最低级别，1 行 = 1 个分类单元（对应于叶节点的物种）：

CREATE TABLE `taxons` (
  `TaxonId` smallint(6) NOT NULL default '0',
  `ClassId` smallint(6) default NULL,
  `OrderId` smallint(6) default NULL,
  `FamilyId` smallint(6) default NULL,
  `GenusId` smallint(6) default NULL,
  `Name` varchar(150) NOT NULL default ''
);

以及数据示例：

+---------+---------+---------+----------+---------+-------------------------------+
| TaxonId | ClassId | OrderId | FamilyId | GenusId | Name                          |
+---------+---------+---------+----------+---------+-------------------------------+
|     254 |       0 |       0 |        0 |       0 | Aves                          |
|     255 |     254 |       0 |        0 |       0 | Gaviiformes                   |
|     256 |     254 |     255 |        0 |       0 | Gaviidae                      |
|     257 |     254 |     255 |      256 |       0 | Gavia                         |
|     258 |     254 |     255 |      256 |     257 | Gavia stellata                |
|     259 |     254 |     255 |      256 |     257 | Gavia arctica                 |
|     260 |     254 |     255 |      256 |     257 | Gavia immer                   |
|     261 |     254 |     255 |      256 |     257 | Gavia adamsii                 |
|     262 |     254 |       0 |        0 |       0 | Podicipediformes              |
|     263 |     254 |     262 |        0 |       0 | Podicipedidae                 |
|     264 |     254 |     262 |      263 |       0 | Tachybaptus                   |

这很棒，因为这样您就可以以非常简单的方式完成所有需要的操作，只要内部类别不会改变其在树中的级别。

This design was not mentioned yet:

Multiple lineage columns

Though it has limitations, if you can bear them, it's very simple and very efficient. Features:

Columns: one for each lineage level, refers to all the parents up to the root, levels below the current items' level are set to 0 (or NULL)
There is a fixed limit to how deep the hierarchy can be
Cheap ancestors, descendants, level
Cheap insert, delete, move of the leaves
Expensive insert, delete, move of the internal nodes

Here follows an example - taxonomic tree of birds so the hierarchy is Class/Order/Family/Genus/Species - species is the lowest level, 1 row = 1 taxon (which corresponds to species in the case of the leaf nodes):

CREATE TABLE `taxons` (
  `TaxonId` smallint(6) NOT NULL default '0',
  `ClassId` smallint(6) default NULL,
  `OrderId` smallint(6) default NULL,
  `FamilyId` smallint(6) default NULL,
  `GenusId` smallint(6) default NULL,
  `Name` varchar(150) NOT NULL default ''
);

and the example of the data:

+---------+---------+---------+----------+---------+-------------------------------+
| TaxonId | ClassId | OrderId | FamilyId | GenusId | Name                          |
+---------+---------+---------+----------+---------+-------------------------------+
|     254 |       0 |       0 |        0 |       0 | Aves                          |
|     255 |     254 |       0 |        0 |       0 | Gaviiformes                   |
|     256 |     254 |     255 |        0 |       0 | Gaviidae                      |
|     257 |     254 |     255 |      256 |       0 | Gavia                         |
|     258 |     254 |     255 |      256 |     257 | Gavia stellata                |
|     259 |     254 |     255 |      256 |     257 | Gavia arctica                 |
|     260 |     254 |     255 |      256 |     257 | Gavia immer                   |
|     261 |     254 |     255 |      256 |     257 | Gavia adamsii                 |
|     262 |     254 |       0 |        0 |       0 | Podicipediformes              |
|     263 |     254 |     262 |        0 |       0 | Podicipedidae                 |
|     264 |     254 |     262 |      263 |       0 | Tachybaptus                   |

This is great because this way you accomplish all the needed operations in a very easy way, as long as the internal categories don't change their level in the tree.

回复收藏 0 原文

樱娆 2024-10-07 01:40:19

这是对您问题的非常片面的回答，但我希望仍然有用。

Microsoft SQL Server 2008 实现了两个对于管理分层数据非常有用的功能：

HierarchyId< /a> 数据类型。
公用表表达式，使用 with 关键字。

看看 " Model Your Data Hierarchies With SQL Server 2008”，作者：MSDN 上的 Kent Tegels。另请参阅我自己的问题：SQL Server 2008 中的递归同表查询

回复收藏 0 原文

风吹过旳痕迹 2024-10-07 01:40:19

如果您的数据库支持数组，您还可以将沿袭列或具体化路径实现为父 ID 数组。

具体来说，使用 Postgres，您可以使用集合运算符来查询层次结构，并通过 GIN 索引获得出色的性能。这使得在单个查询中查找父母、孩子和深度变得非常简单。更新也非常易于管理。

如果您好奇的话，我有一篇关于使用用于物化路径的数组的完整文章。

回复收藏 0 原文

笑叹一世浮沉 2024-10-07 01:40:19

这确实是一个方钉圆孔的问题。

如果关系数据库和 SQL 是您拥有或愿意使用的唯一锤子，那么迄今为止发布的答案就足够了。但是，为什么不使用专门用于处理分层数据的工具呢？图数据库非常适合复杂的分层数据。

与图数据库解决方案可以轻松解决相同问题相比，关系模型的低效率以及将图/分层模型映射到关系模型的任何代码/查询解决方案的复杂性是不值得的。

将物料清单视为常见的分层数据结构。

class Component extends Vertex {
    long assetId;
    long partNumber;
    long material;
    long amount;
};

class PartOf extends Edge {
};

class AdjacentTo extends Edge {
};

两个子组件之间的最短路径：简单的图遍历算法。可接受的路径可以根据标准进行限定。

相似度：两个程序集之间的相似程度是多少？对两个子树执行遍历，计算两个子树的交集和并集。相似百分比是交集除以并集。

传递闭包：遍历子树并总结感兴趣的字段，例如“子组件中有多少铝？”

是的，您可以使用 SQL 和关系数据库来解决问题。然而，如果您愿意使用正确的工具来完成工作，还有更好的方法。

This is really a square peg, round hole question.

If relational databases and SQL are the only hammer you have or are willing to use, then the answers that have been posted thus far are adequate. However, why not use a tool designed to handle hierarchical data? Graph database are ideal for complex hierarchical data.

The inefficiencies of the relational model along with the complexities of any code/query solution to map a graph/hierarchical model onto a relational model is just not worth the effort when compared to the ease with which a graph database solution can solve the same problem.

Consider a Bill of Materials as a common hierarchical data structure.

class Component extends Vertex {
    long assetId;
    long partNumber;
    long material;
    long amount;
};

class PartOf extends Edge {
};

class AdjacentTo extends Edge {
};

Shortest path between two sub-assemblies: Simple graph traversal algorithm. Acceptable paths can be qualified based on criteria.

Similarity: What is the degree of similarity between two assemblies? Perform a traversal on both sub-trees computing the intersection and union of the two sub-trees. The percent similar is the intersection divided by the union.

Transitive Closure: Walk the sub-tree and sum up the field(s) of interest, e.g. "How much aluminum is in a sub-assembly?"

Yes, you can solve the problem with SQL and a relational database. However, there are much better approaches if you are willing to use the right tool for the job.

回复收藏 0 原文

小ぇ时光︴ 2024-10-07 01:40:19

我正在使用带有闭包表的 PostgreSQL 作为我的层次结构。
我有一个适用于整个数据库的通用存储过程：

CREATE FUNCTION nomen_tree() RETURNS trigger
    LANGUAGE plpgsql
    AS $_$
DECLARE
  old_parent INTEGER;
  new_parent INTEGER;
  id_nom INTEGER;
  txt_name TEXT;
BEGIN
-- TG_ARGV[0] = name of table with entities with PARENT-CHILD relationships (TBL_ORIG)
-- TG_ARGV[1] = name of helper table with ANCESTOR, CHILD, DEPTH information (TBL_TREE)
-- TG_ARGV[2] = name of the field in TBL_ORIG which is used for the PARENT-CHILD relationship (FLD_PARENT)
    IF TG_OP = 'INSERT' THEN
    EXECUTE 'INSERT INTO ' || TG_ARGV[1] || ' (child_id,ancestor_id,depth) 
        SELECT $1.id,$1.id,0 UNION ALL
      SELECT $1.id,ancestor_id,depth+1 FROM ' || TG_ARGV[1] || ' WHERE child_id=$1.' || TG_ARGV[2] USING NEW;
    ELSE                                                           
    -- EXECUTE does not support conditional statements inside
    EXECUTE 'SELECT $1.' || TG_ARGV[2] || ',$2.' || TG_ARGV[2] INTO old_parent,new_parent USING OLD,NEW;
    IF COALESCE(old_parent,0) <> COALESCE(new_parent,0) THEN
      EXECUTE '
      -- prevent cycles in the tree
      UPDATE ' || TG_ARGV[0] || ' SET ' || TG_ARGV[2] || ' = $1.' || TG_ARGV[2]
        || ' WHERE id=$2.' || TG_ARGV[2] || ' AND EXISTS(SELECT 1 FROM '
        || TG_ARGV[1] || ' WHERE child_id=$2.' || TG_ARGV[2] || ' AND ancestor_id=$2.id);
      -- first remove edges between all old parents of node and its descendants
      DELETE FROM ' || TG_ARGV[1] || ' WHERE child_id IN
        (SELECT child_id FROM ' || TG_ARGV[1] || ' WHERE ancestor_id = $1.id)
        AND ancestor_id IN
        (SELECT ancestor_id FROM ' || TG_ARGV[1] || ' WHERE child_id = $1.id AND ancestor_id <> $1.id);
      -- then add edges for all new parents ...
      INSERT INTO ' || TG_ARGV[1] || ' (child_id,ancestor_id,depth) 
        SELECT child_id,ancestor_id,d_c+d_a FROM
        (SELECT child_id,depth AS d_c FROM ' || TG_ARGV[1] || ' WHERE ancestor_id=$2.id) AS child
        CROSS JOIN
        (SELECT ancestor_id,depth+1 AS d_a FROM ' || TG_ARGV[1] || ' WHERE child_id=$2.' 
        || TG_ARGV[2] || ') AS parent;' USING OLD, NEW;
    END IF;
  END IF;
  RETURN NULL;
END;
$_$;

然后，对于具有层次结构的每个表，我创建一个触发器

CREATE TRIGGER nomenclature_tree_tr AFTER INSERT OR UPDATE ON nomenclature FOR EACH ROW EXECUTE PROCEDURE nomen_tree('my_db.nomenclature', 'my_db.nom_helper', 'parent_id');

为了从现有层次结构填充闭包表，我使用此存储过程：

CREATE FUNCTION rebuild_tree(tbl_base text, tbl_closure text, fld_parent text) RETURNS void
    LANGUAGE plpgsql
    AS $
BEGIN
    EXECUTE 'TRUNCATE ' || tbl_closure || ';
    INSERT INTO ' || tbl_closure || ' (child_id,ancestor_id,depth) 
        WITH RECURSIVE tree AS
      (
        SELECT id AS child_id,id AS ancestor_id,0 AS depth FROM ' || tbl_base || '
        UNION ALL 
        SELECT t.id,ancestor_id,depth+1 FROM ' || tbl_base || ' AS t
        JOIN tree ON child_id = ' || fld_parent || '
      )
      SELECT * FROM tree;';
END;
$;

闭包表由 3 列定义 - ANCESTOR_ID、DESCENDANT_ID ，深度。可以（我什至建议）存储具有相同 ANCESTOR 和 DESCENDANT 值以及 DEPTH 值为零的记录。这将简化检索层次结构的查询。它们确实非常简单：

-- get all descendants
SELECT tbl_orig.*,depth FROM tbl_closure LEFT JOIN tbl_orig ON descendant_id = tbl_orig.id WHERE ancestor_id = XXX AND depth <> 0;
-- get only direct descendants
SELECT tbl_orig.* FROM tbl_closure LEFT JOIN tbl_orig ON descendant_id = tbl_orig.id WHERE ancestor_id = XXX AND depth = 1;
-- get all ancestors
SELECT tbl_orig.* FROM tbl_closure LEFT JOIN tbl_orig ON ancestor_id = tbl_orig.id WHERE descendant_id = XXX AND depth <> 0;
-- find the deepest level of children
SELECT MAX(depth) FROM tbl_closure WHERE ancestor_id = XXX;

I am using PostgreSQL with closure tables for my hierarchies.
I have one universal stored procedure for the whole database:

CREATE FUNCTION nomen_tree() RETURNS trigger
    LANGUAGE plpgsql
    AS $_$
DECLARE
  old_parent INTEGER;
  new_parent INTEGER;
  id_nom INTEGER;
  txt_name TEXT;
BEGIN
-- TG_ARGV[0] = name of table with entities with PARENT-CHILD relationships (TBL_ORIG)
-- TG_ARGV[1] = name of helper table with ANCESTOR, CHILD, DEPTH information (TBL_TREE)
-- TG_ARGV[2] = name of the field in TBL_ORIG which is used for the PARENT-CHILD relationship (FLD_PARENT)
    IF TG_OP = 'INSERT' THEN
    EXECUTE 'INSERT INTO ' || TG_ARGV[1] || ' (child_id,ancestor_id,depth) 
        SELECT $1.id,$1.id,0 UNION ALL
      SELECT $1.id,ancestor_id,depth+1 FROM ' || TG_ARGV[1] || ' WHERE child_id=$1.' || TG_ARGV[2] USING NEW;
    ELSE                                                           
    -- EXECUTE does not support conditional statements inside
    EXECUTE 'SELECT $1.' || TG_ARGV[2] || ',$2.' || TG_ARGV[2] INTO old_parent,new_parent USING OLD,NEW;
    IF COALESCE(old_parent,0) <> COALESCE(new_parent,0) THEN
      EXECUTE '
      -- prevent cycles in the tree
      UPDATE ' || TG_ARGV[0] || ' SET ' || TG_ARGV[2] || ' = $1.' || TG_ARGV[2]
        || ' WHERE id=$2.' || TG_ARGV[2] || ' AND EXISTS(SELECT 1 FROM '
        || TG_ARGV[1] || ' WHERE child_id=$2.' || TG_ARGV[2] || ' AND ancestor_id=$2.id);
      -- first remove edges between all old parents of node and its descendants
      DELETE FROM ' || TG_ARGV[1] || ' WHERE child_id IN
        (SELECT child_id FROM ' || TG_ARGV[1] || ' WHERE ancestor_id = $1.id)
        AND ancestor_id IN
        (SELECT ancestor_id FROM ' || TG_ARGV[1] || ' WHERE child_id = $1.id AND ancestor_id <> $1.id);
      -- then add edges for all new parents ...
      INSERT INTO ' || TG_ARGV[1] || ' (child_id,ancestor_id,depth) 
        SELECT child_id,ancestor_id,d_c+d_a FROM
        (SELECT child_id,depth AS d_c FROM ' || TG_ARGV[1] || ' WHERE ancestor_id=$2.id) AS child
        CROSS JOIN
        (SELECT ancestor_id,depth+1 AS d_a FROM ' || TG_ARGV[1] || ' WHERE child_id=$2.' 
        || TG_ARGV[2] || ') AS parent;' USING OLD, NEW;
    END IF;
  END IF;
  RETURN NULL;
END;
$_$;

Then for each table where I have a hierarchy, I create a trigger

CREATE TRIGGER nomenclature_tree_tr AFTER INSERT OR UPDATE ON nomenclature FOR EACH ROW EXECUTE PROCEDURE nomen_tree('my_db.nomenclature', 'my_db.nom_helper', 'parent_id');

For populating a closure table from existing hierarchy I use this stored procedure:

CREATE FUNCTION rebuild_tree(tbl_base text, tbl_closure text, fld_parent text) RETURNS void
    LANGUAGE plpgsql
    AS $
BEGIN
    EXECUTE 'TRUNCATE ' || tbl_closure || ';
    INSERT INTO ' || tbl_closure || ' (child_id,ancestor_id,depth) 
        WITH RECURSIVE tree AS
      (
        SELECT id AS child_id,id AS ancestor_id,0 AS depth FROM ' || tbl_base || '
        UNION ALL 
        SELECT t.id,ancestor_id,depth+1 FROM ' || tbl_base || ' AS t
        JOIN tree ON child_id = ' || fld_parent || '
      )
      SELECT * FROM tree;';
END;
$;

Closure tables are defined with 3 columns - ANCESTOR_ID, DESCENDANT_ID, DEPTH. It is possible (and I even advice) to store records with same value for ANCESTOR and DESCENDANT, and a value of zero for DEPTH. This will simplify the queries for retrieval of the hierarchy. And they are very simple indeed:

-- get all descendants
SELECT tbl_orig.*,depth FROM tbl_closure LEFT JOIN tbl_orig ON descendant_id = tbl_orig.id WHERE ancestor_id = XXX AND depth <> 0;
-- get only direct descendants
SELECT tbl_orig.* FROM tbl_closure LEFT JOIN tbl_orig ON descendant_id = tbl_orig.id WHERE ancestor_id = XXX AND depth = 1;
-- get all ancestors
SELECT tbl_orig.* FROM tbl_closure LEFT JOIN tbl_orig ON ancestor_id = tbl_orig.id WHERE descendant_id = XXX AND depth <> 0;
-- find the deepest level of children
SELECT MAX(depth) FROM tbl_closure WHERE ancestor_id = XXX;

回复收藏 0 原文