当前位置：文江博客话题详情

在关系数据库中存储层次数据的选项是什么？

发布于 2025-02-08 19:22:33 字数 8 浏览 3 评论 0 原文

continue

Good Overviews

Generally speaking, you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:

One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set, and Nested Interval I've found.
Models for hierarchical data: slides with good explanations of tradeoffs and example usage
Representing hierarchies in MySQL: very good overview of Nested Set in particular
Hierarchical data in RDBMSs: a most comprehensive and well-organized set of links I've seen, but not much in the way of explanation

Options

Ones I am aware of and general features:

Adjacency List:

Columns: ID, ParentID
Easy to implement.
Cheap node moves, inserts, and deletes.
Expensive to find the level, ancestry & descendants, path
Avoid N+1 via Common Table Expressions in databases that support them

Nested Set (a.k.a Modified Preorder Tree Traversal (MPTT))

Columns: Left, Right
Cheap ancestry, descendants
Very expensive O(n/2) moves, inserts, deletes due to volatile encoding

Bridge Table (a.k.a. Closure Table /w triggers)

Uses separate join table with ancestor, descendant, depth (optional)
Cheap ancestry and descendants
Writes costs O(log n) (size of the subtree) for insert, updates, deletes
Normalized encoding: good for RDBMS statistics & query planner in joins
Requires multiple rows per node

Lineage Column (a.k.a. Materialized Path, Path Enumeration)

Column: lineage (e.g. /parent/child/grandchild/etc...)
Cheap descendants via prefix query (e.g. LEFT(lineage, #) = '/enumerated/path')
Writes costs O(log n) (size of the subtree) for insert, updates, deletes
Non-relational: relies on Array datatype or serialized string format

Nested Intervals

Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)
Has real/float/decimal representation/precision issues
Matrix encoding variant adds ancestor encoding (materialized path) for "free", but with the added trickiness of linear algebra.

Flat Table

A modified Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
Cheap to iterate/paginate over
Expensive move and delete
Good Use: threaded discussion - forums / blog comments

Multiple lineage columns

Columns: one for each lineage level, refers to all the parents up to the root, levels down from the item's level are set to NULL
Cheap ancestors, descendants, level
Cheap insert, delete, move of the leaves
Expensive insert, delete, move of the internal nodes
Hard limit to how deep the hierarchy can be

Database Specific Notes

MySQL/MariaDB

~~Use session variables for Adjacency List~~
Use CTEs in MySQL 8.0 or MariaDB 10.2

Oracle

Use CONNECT BY to traverse Adjacency Lists

PostgreSQL

ltree datatype for Materialized Path

SQL Server

General summary
2008 offers HierarchyId data type that appears to help with the Lineage Column approach and expand the depth that can be represented.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清醇 2025-02-15 19:22:33

我最喜欢的答案是该线程中的第一句话是什么。使用邻接列表来维护层次结构并使用嵌套集来查询层次结构。

到目前为止，问题一直是，从邻接列表到嵌套集的封面方法一直很慢，因为大多数人都使用被称为“ push stack”的极端RBAR方法来进行转换，并且被认为是昂贵的方式通过邻接列表和嵌套集的出色表现，达到维护简单性的必要性。结果，大多数人最终不得不安顿下一个或另一个人，尤其是如果有一个糟糕的节点左右。使用推动堆栈方法可能需要一整天的时间来进行有关MLM'S认为是小型节点层次结构的转换。

我以为我会通过提出一种将邻接列表转换为嵌套集的方法，以似乎是不可能的。这是我的i5笔记本电脑上推动堆栈方法的性能。

Duration for     1,000 Nodes = 00:00:00:870 
Duration for    10,000 Nodes = 00:01:01:783 (70 times slower instead of just 10)
Duration for   100,000 Nodes = 00:49:59:730 (3,446 times slower instead of just 100) 
Duration for 1,000,000 Nodes = 'Didn't even try this'

这是新方法的持续时间（使用括号中的推动堆栈方法）。

Duration for     1,000 Nodes = 00:00:00:053 (compared to 00:00:00:870)
Duration for    10,000 Nodes = 00:00:00:323 (compared to 00:01:01:783)
Duration for   100,000 Nodes = 00:00:03:867 (compared to 00:49:59:730)
Duration for 1,000,000 Nodes = 00:00:54:283 (compared to something like 2 days!!!)

是的，这是正确的。 100万个节点在不到一分钟的时间内转换了100,000个节点，在4秒钟内。

您可以阅读有关新方法的信息，并在以下URL中获取代码的副本。
http://www.sqlservercentral.com/articles/articles/hierarchy/hierarchy/94040/

使用类似方法汇总的“层次结构。 MLM'ER和制作材料清单的人将对本文特别感兴趣。

http://www.sqlservercentral.com/articles/articles/t-sql/94570/ 停下来看看这两篇文章，跳入“加入讨论”链接，让我知道您的想法。

My favorite answer is as what the first sentence in this thread suggested. Use an Adjacency List to maintain the hierarchy and use Nested Sets to query the hierarchy.

The problem up until now has been that the coversion method from an Adjacecy List to Nested Sets has been frightfully slow because most people use the extreme RBAR method known as a "Push Stack" to do the conversion and has been considered to be way to expensive to reach the Nirvana of the simplicity of maintenance by the Adjacency List and the awesome performance of Nested Sets. As a result, most people end up having to settle for one or the other especially if there are more than, say, a lousy 100,000 nodes or so. Using the push stack method can take a whole day to do the conversion on what MLM'ers would consider to be a small million node hierarchy.

I thought I'd give Celko a bit of competition by coming up with a method to convert an Adjacency List to Nested sets at speeds that just seem impossible. Here's the performance of the push stack method on my i5 laptop.

Duration for     1,000 Nodes = 00:00:00:870 
Duration for    10,000 Nodes = 00:01:01:783 (70 times slower instead of just 10)
Duration for   100,000 Nodes = 00:49:59:730 (3,446 times slower instead of just 100) 
Duration for 1,000,000 Nodes = 'Didn't even try this'

And here's the duration for the new method (with the push stack method in parenthesis).

Duration for     1,000 Nodes = 00:00:00:053 (compared to 00:00:00:870)
Duration for    10,000 Nodes = 00:00:00:323 (compared to 00:01:01:783)
Duration for   100,000 Nodes = 00:00:03:867 (compared to 00:49:59:730)
Duration for 1,000,000 Nodes = 00:00:54:283 (compared to something like 2 days!!!)

Yes, that's correct. 1 million nodes converted in less than a minute and 100,000 nodes in under 4 seconds.

You can read about the new method and get a copy of the code at the following URL.
http://www.sqlservercentral.com/articles/Hierarchy/94040/

I also developed a "pre-aggregated" hierarchy using similar methods. MLM'ers and people making bills of materials will be particularly interested in this article.
http://www.sqlservercentral.com/articles/T-SQL/94570/

If you do stop by to take a look at either article, jump into the "Join the discussion" link and let me know what you think.

回复收藏 0 原文

时光磨忆 2025-02-15 19:22:33

我选择的邻接模型 +嵌套集型模型

是因为我可以轻松地将新项目插入树（您只需要一个分支的ID即可插入新项目），并且还可以很快地查询它。

+-------------+----------------------+--------+-----+-----+
| category_id | name                 | parent | lft | rgt |
+-------------+----------------------+--------+-----+-----+
|           1 | ELECTRONICS          |   NULL |   1 |  20 |
|           2 | TELEVISIONS          |      1 |   2 |   9 |
|           3 | TUBE                 |      2 |   3 |   4 |
|           4 | LCD                  |      2 |   5 |   6 |
|           5 | PLASMA               |      2 |   7 |   8 |
|           6 | PORTABLE ELECTRONICS |      1 |  10 |  19 |
|           7 | MP3 PLAYERS          |      6 |  11 |  14 |
|           8 | FLASH                |      7 |  12 |  13 |
|           9 | CD PLAYERS           |      6 |  15 |  16 |
|          10 | 2 WAY RADIOS         |      6 |  17 |  18 |
+-------------+----------------------+--------+-----+-----+

每次您需要任何父母的孩子时，您只需查询 parent 列即可。
如果您需要所有父母的所有后代，则您查询对 lft> lft 和 rgt of parent的项目的 lft 。
如果您需要所有节点的所有父母，请询问其 lft 的项目，低于节点的 lft 和 rgt 大于节点的 rgt ，然后按 parent 对。

我需要使访问和查询树比插入更快，这就是为什么我选择此

唯一的问题是修复左和 right 列插入新项目时。好吧，我为此创建了一个存储过程，并每次插入一个新项目时都会称其为罕见，但这确实很快。
我从乔·塞尔科（Joe Celko）的书以及存储程序以及如何提出的想法中得到了这个想法。
https://dba.stackexchange.com/q/q/q/89051/41481

尽管此解决方案可以快速搜索来定位后代，由于其在这些操作中的性能缓慢而需要频繁插入或删除的大型数据集并不理想。因此，它最适合不经常造成的桌子。

Adjacency Model + Nested Sets Model

I went for it because I could insert new items to the tree easily (you just need a branch's id to insert a new item to it) and also query it quite fast.

+-------------+----------------------+--------+-----+-----+
| category_id | name                 | parent | lft | rgt |
+-------------+----------------------+--------+-----+-----+
|           1 | ELECTRONICS          |   NULL |   1 |  20 |
|           2 | TELEVISIONS          |      1 |   2 |   9 |
|           3 | TUBE                 |      2 |   3 |   4 |
|           4 | LCD                  |      2 |   5 |   6 |
|           5 | PLASMA               |      2 |   7 |   8 |
|           6 | PORTABLE ELECTRONICS |      1 |  10 |  19 |
|           7 | MP3 PLAYERS          |      6 |  11 |  14 |
|           8 | FLASH                |      7 |  12 |  13 |
|           9 | CD PLAYERS           |      6 |  15 |  16 |
|          10 | 2 WAY RADIOS         |      6 |  17 |  18 |
+-------------+----------------------+--------+-----+-----+

Every time you need all children of any parent you just query the parent column.
If you needed all descendants of any parent you query for items which have their lft between lft and rgt of parent.
If you needed all parents of any node up to the root of the tree, you query for items having lft lower than the node's lft and rgt bigger than the node's rgt and sort the by parent.

I needed to make accessing and querying the tree faster than inserts, that's why I chose this

The only problem is to fix the left and right columns when inserting new items. well I created a stored procedure for it and called it every time I inserted a new item which was rare in my case but it is really fast.
I got the idea from the Joe Celko's book, and the stored procedure and how I came up with it is explained here in DBA SE
https://dba.stackexchange.com/q/89051/41481

Although this solution allows for rapid searches to locate descendants, it is not ideal for handling large datasets that require frequent inserts or deletes due to its slow performance in these operations. Therefore, it is best suited for tables that won't chnage frequently.

回复收藏 0 原文

落花随流水 2025-02-15 19:22:33

该设计尚未提及：

多个谱系列

尽管有限制，如果您可以忍受，它非常简单，非常非常简单，非常非常非常高效的。功能：

列：每个谱系级别一个，指的是所有父母，直到根部，低于当前项目的级别的级别设置为0（或null），
对层次结构的深度如何为
便宜的祖先有固定的限制，后代，水平
廉价插入，删除，叶子的移动
昂贵的插入物，删除，内部节点的移动

在这里遵循一个例子 - 鸟类的分类树，因此层次结构是类/订单/订单/家庭/族/属/物种 - 物种是最低水平单元（在叶子节点的情况下对应于物种）

CREATE TABLE `taxons` (
  `TaxonId` smallint(6) NOT NULL default '0',
  `ClassId` smallint(6) default NULL,
  `OrderId` smallint(6) default NULL,
  `FamilyId` smallint(6) default NULL,
  `GenusId` smallint(6) default NULL,
  `Name` varchar(150) NOT NULL default ''
);

，1行= 1分类

+---------+---------+---------+----------+---------+-------------------------------+
| TaxonId | ClassId | OrderId | FamilyId | GenusId | Name                          |
+---------+---------+---------+----------+---------+-------------------------------+
|     254 |       0 |       0 |        0 |       0 | Aves                          |
|     255 |     254 |       0 |        0 |       0 | Gaviiformes                   |
|     256 |     254 |     255 |        0 |       0 | Gaviidae                      |
|     257 |     254 |     255 |      256 |       0 | Gavia                         |
|     258 |     254 |     255 |      256 |     257 | Gavia stellata                |
|     259 |     254 |     255 |      256 |     257 | Gavia arctica                 |
|     260 |     254 |     255 |      256 |     257 | Gavia immer                   |
|     261 |     254 |     255 |      256 |     257 | Gavia adamsii                 |
|     262 |     254 |       0 |        0 |       0 | Podicipediformes              |
|     263 |     254 |     262 |        0 |       0 | Podicipedidae                 |
|     264 |     254 |     262 |      263 |       0 | Tachybaptus                   |

。类别不会改变树上的水平。

This design was not mentioned yet:

Multiple lineage columns

Though it has limitations, if you can bear them, it's very simple and very efficient. Features:

Columns: one for each lineage level, refers to all the parents up to the root, levels below the current items' level are set to 0 (or NULL)
There is a fixed limit to how deep the hierarchy can be
Cheap ancestors, descendants, level
Cheap insert, delete, move of the leaves
Expensive insert, delete, move of the internal nodes

Here follows an example - taxonomic tree of birds so the hierarchy is Class/Order/Family/Genus/Species - species is the lowest level, 1 row = 1 taxon (which corresponds to species in the case of the leaf nodes):

CREATE TABLE `taxons` (
  `TaxonId` smallint(6) NOT NULL default '0',
  `ClassId` smallint(6) default NULL,
  `OrderId` smallint(6) default NULL,
  `FamilyId` smallint(6) default NULL,
  `GenusId` smallint(6) default NULL,
  `Name` varchar(150) NOT NULL default ''
);

and the example of the data:

+---------+---------+---------+----------+---------+-------------------------------+
| TaxonId | ClassId | OrderId | FamilyId | GenusId | Name                          |
+---------+---------+---------+----------+---------+-------------------------------+
|     254 |       0 |       0 |        0 |       0 | Aves                          |
|     255 |     254 |       0 |        0 |       0 | Gaviiformes                   |
|     256 |     254 |     255 |        0 |       0 | Gaviidae                      |
|     257 |     254 |     255 |      256 |       0 | Gavia                         |
|     258 |     254 |     255 |      256 |     257 | Gavia stellata                |
|     259 |     254 |     255 |      256 |     257 | Gavia arctica                 |
|     260 |     254 |     255 |      256 |     257 | Gavia immer                   |
|     261 |     254 |     255 |      256 |     257 | Gavia adamsii                 |
|     262 |     254 |       0 |        0 |       0 | Podicipediformes              |
|     263 |     254 |     262 |        0 |       0 | Podicipedidae                 |
|     264 |     254 |     262 |      263 |       0 | Tachybaptus                   |

This is great because this way you accomplish all the needed operations in a very easy way, as long as the internal categories don't change their level in the tree.

回复收藏 0 原文

冬天旳寂寞 2025-02-15 19:22:33

这是您问题的部分答案，但我希望仍然有用。

Microsoft SQL Server 2008实现了两个对于管理层次数据极为有用的功能：

数据类型。
通用表表达式，使用带有关键字。

看看 kent tegels在MSDN上的开始。另请参阅我自己的问题： sql Server 2008中的递归same-table查询

回复收藏 0 原文

御守 2025-02-15 19:22:33

如果您的数据库支持数组，则您还可以将谱系列或物有的路径作为父ID数组来实现。

特别是在Postgres中，您可以使用集合操作员查询层次结构，并通过杜松子酒指数获得出色的性能。这使得在一个查询中找到父母，孩子和深度相当微不足道。更新也非常易于管理。

我有完整的文字使用“ nofollow noreferrer”>用于实现路径的阵列如果您很好奇。

回复收藏 0 原文

太傻旳人生 2025-02-15 19:22:33

这确实是一个方形的圆孔问题。

如果关系数据库和SQL是您唯一或愿意使用的锤子，那么到目前为止发布的答案是足够的。但是，为什么不使用旨在处理层次数据的工具呢？ Graph Database 是复杂的层次数据的理想选择。

与图形数据库解决方案解决相同的问题相比，关系模型的效率低下以及将图形/分层模型映射到关系模型的任何代码/查询解决方案的复杂性以及将其映射到关系模型的复杂性是不值得的。

将材料清单视为常见的分层数据结构。

class Component extends Vertex {
    long assetId;
    long partNumber;
    long material;
    long amount;
};

class PartOf extends Edge {
};

class AdjacentTo extends Edge {
};

两个子组件之间的最短路径：简单的图形遍历算法。可接受的路径可以根据标准获得资格。

相似性：两个组件之间的相似程度是多少？在计算两个子树的交点和结合的两个子树上进行遍历。相似的百分比是交叉路口除以联盟。

传递闭合：走下子树并总结了感兴趣的领域，例如“子组装中有多少铝？”

是的，您可以使用SQL和关系数据库解决问题。但是，如果您愿意使用合适的工具来工作，可以采用更好的方法。

This is really a square peg, round hole question.

If relational databases and SQL are the only hammer you have or are willing to use, then the answers that have been posted thus far are adequate. However, why not use a tool designed to handle hierarchical data? Graph database are ideal for complex hierarchical data.

The inefficiencies of the relational model along with the complexities of any code/query solution to map a graph/hierarchical model onto a relational model is just not worth the effort when compared to the ease with which a graph database solution can solve the same problem.

Consider a Bill of Materials as a common hierarchical data structure.

class Component extends Vertex {
    long assetId;
    long partNumber;
    long material;
    long amount;
};

class PartOf extends Edge {
};

class AdjacentTo extends Edge {
};

Shortest path between two sub-assemblies: Simple graph traversal algorithm. Acceptable paths can be qualified based on criteria.

Similarity: What is the degree of similarity between two assemblies? Perform a traversal on both sub-trees computing the intersection and union of the two sub-trees. The percent similar is the intersection divided by the union.

Transitive Closure: Walk the sub-tree and sum up the field(s) of interest, e.g. "How much aluminum is in a sub-assembly?"

Yes, you can solve the problem with SQL and a relational database. However, there are much better approaches if you are willing to use the right tool for the job.

回复收藏 0 原文

庆幸我还是我 2025-02-15 19:22:33

我正在为层次结构使用PostgreSQL带有封闭表。
我为整个数据库都有一个通用存储的过程：

CREATE FUNCTION nomen_tree() RETURNS trigger
    LANGUAGE plpgsql
    AS $_$
DECLARE
  old_parent INTEGER;
  new_parent INTEGER;
  id_nom INTEGER;
  txt_name TEXT;
BEGIN
-- TG_ARGV[0] = name of table with entities with PARENT-CHILD relationships (TBL_ORIG)
-- TG_ARGV[1] = name of helper table with ANCESTOR, CHILD, DEPTH information (TBL_TREE)
-- TG_ARGV[2] = name of the field in TBL_ORIG which is used for the PARENT-CHILD relationship (FLD_PARENT)
    IF TG_OP = 'INSERT' THEN
    EXECUTE 'INSERT INTO ' || TG_ARGV[1] || ' (child_id,ancestor_id,depth) 
        SELECT $1.id,$1.id,0 UNION ALL
      SELECT $1.id,ancestor_id,depth+1 FROM ' || TG_ARGV[1] || ' WHERE child_id=$1.' || TG_ARGV[2] USING NEW;
    ELSE                                                           
    -- EXECUTE does not support conditional statements inside
    EXECUTE 'SELECT $1.' || TG_ARGV[2] || ',$2.' || TG_ARGV[2] INTO old_parent,new_parent USING OLD,NEW;
    IF COALESCE(old_parent,0) <> COALESCE(new_parent,0) THEN
      EXECUTE '
      -- prevent cycles in the tree
      UPDATE ' || TG_ARGV[0] || ' SET ' || TG_ARGV[2] || ' = $1.' || TG_ARGV[2]
        || ' WHERE id=$2.' || TG_ARGV[2] || ' AND EXISTS(SELECT 1 FROM '
        || TG_ARGV[1] || ' WHERE child_id=$2.' || TG_ARGV[2] || ' AND ancestor_id=$2.id);
      -- first remove edges between all old parents of node and its descendants
      DELETE FROM ' || TG_ARGV[1] || ' WHERE child_id IN
        (SELECT child_id FROM ' || TG_ARGV[1] || ' WHERE ancestor_id = $1.id)
        AND ancestor_id IN
        (SELECT ancestor_id FROM ' || TG_ARGV[1] || ' WHERE child_id = $1.id AND ancestor_id <> $1.id);
      -- then add edges for all new parents ...
      INSERT INTO ' || TG_ARGV[1] || ' (child_id,ancestor_id,depth) 
        SELECT child_id,ancestor_id,d_c+d_a FROM
        (SELECT child_id,depth AS d_c FROM ' || TG_ARGV[1] || ' WHERE ancestor_id=$2.id) AS child
        CROSS JOIN
        (SELECT ancestor_id,depth+1 AS d_a FROM ' || TG_ARGV[1] || ' WHERE child_id=$2.' 
        || TG_ARGV[2] || ') AS parent;' USING OLD, NEW;
    END IF;
  END IF;
  RETURN NULL;
END;
$_$;

然后，对于每个具有层次结构的表，我创建了一个触发器，

CREATE TRIGGER nomenclature_tree_tr AFTER INSERT OR UPDATE ON nomenclature FOR EACH ROW EXECUTE PROCEDURE nomen_tree('my_db.nomenclature', 'my_db.nom_helper', 'parent_id');

用于从现有层次结构中填充封闭

CREATE FUNCTION rebuild_tree(tbl_base text, tbl_closure text, fld_parent text) RETURNS void
    LANGUAGE plpgsql
    AS $
BEGIN
    EXECUTE 'TRUNCATE ' || tbl_closure || ';
    INSERT INTO ' || tbl_closure || ' (child_id,ancestor_id,depth) 
        WITH RECURSIVE tree AS
      (
        SELECT id AS child_id,id AS ancestor_id,0 AS depth FROM ' || tbl_base || '
        UNION ALL 
        SELECT t.id,ancestor_id,depth+1 FROM ' || tbl_base || ' AS t
        JOIN tree ON child_id = ' || fld_parent || '
      )
      SELECT * FROM tree;';
END;
$;

表，深度。有可能（甚至我建议）存储具有相同价值的祖先和后代的记录，而深度为零的值。这将简化检索层次结构的查询。它们确实非常简单：

-- get all descendants
SELECT tbl_orig.*,depth FROM tbl_closure LEFT JOIN tbl_orig ON descendant_id = tbl_orig.id WHERE ancestor_id = XXX AND depth <> 0;
-- get only direct descendants
SELECT tbl_orig.* FROM tbl_closure LEFT JOIN tbl_orig ON descendant_id = tbl_orig.id WHERE ancestor_id = XXX AND depth = 1;
-- get all ancestors
SELECT tbl_orig.* FROM tbl_closure LEFT JOIN tbl_orig ON ancestor_id = tbl_orig.id WHERE descendant_id = XXX AND depth <> 0;
-- find the deepest level of children
SELECT MAX(depth) FROM tbl_closure WHERE ancestor_id = XXX;

I am using PostgreSQL with closure tables for my hierarchies.
I have one universal stored procedure for the whole database:

CREATE FUNCTION nomen_tree() RETURNS trigger
    LANGUAGE plpgsql
    AS $_$
DECLARE
  old_parent INTEGER;
  new_parent INTEGER;
  id_nom INTEGER;
  txt_name TEXT;
BEGIN
-- TG_ARGV[0] = name of table with entities with PARENT-CHILD relationships (TBL_ORIG)
-- TG_ARGV[1] = name of helper table with ANCESTOR, CHILD, DEPTH information (TBL_TREE)
-- TG_ARGV[2] = name of the field in TBL_ORIG which is used for the PARENT-CHILD relationship (FLD_PARENT)
    IF TG_OP = 'INSERT' THEN
    EXECUTE 'INSERT INTO ' || TG_ARGV[1] || ' (child_id,ancestor_id,depth) 
        SELECT $1.id,$1.id,0 UNION ALL
      SELECT $1.id,ancestor_id,depth+1 FROM ' || TG_ARGV[1] || ' WHERE child_id=$1.' || TG_ARGV[2] USING NEW;
    ELSE                                                           
    -- EXECUTE does not support conditional statements inside
    EXECUTE 'SELECT $1.' || TG_ARGV[2] || ',$2.' || TG_ARGV[2] INTO old_parent,new_parent USING OLD,NEW;
    IF COALESCE(old_parent,0) <> COALESCE(new_parent,0) THEN
      EXECUTE '
      -- prevent cycles in the tree
      UPDATE ' || TG_ARGV[0] || ' SET ' || TG_ARGV[2] || ' = $1.' || TG_ARGV[2]
        || ' WHERE id=$2.' || TG_ARGV[2] || ' AND EXISTS(SELECT 1 FROM '
        || TG_ARGV[1] || ' WHERE child_id=$2.' || TG_ARGV[2] || ' AND ancestor_id=$2.id);
      -- first remove edges between all old parents of node and its descendants
      DELETE FROM ' || TG_ARGV[1] || ' WHERE child_id IN
        (SELECT child_id FROM ' || TG_ARGV[1] || ' WHERE ancestor_id = $1.id)
        AND ancestor_id IN
        (SELECT ancestor_id FROM ' || TG_ARGV[1] || ' WHERE child_id = $1.id AND ancestor_id <> $1.id);
      -- then add edges for all new parents ...
      INSERT INTO ' || TG_ARGV[1] || ' (child_id,ancestor_id,depth) 
        SELECT child_id,ancestor_id,d_c+d_a FROM
        (SELECT child_id,depth AS d_c FROM ' || TG_ARGV[1] || ' WHERE ancestor_id=$2.id) AS child
        CROSS JOIN
        (SELECT ancestor_id,depth+1 AS d_a FROM ' || TG_ARGV[1] || ' WHERE child_id=$2.' 
        || TG_ARGV[2] || ') AS parent;' USING OLD, NEW;
    END IF;
  END IF;
  RETURN NULL;
END;
$_$;

Then for each table where I have a hierarchy, I create a trigger

CREATE TRIGGER nomenclature_tree_tr AFTER INSERT OR UPDATE ON nomenclature FOR EACH ROW EXECUTE PROCEDURE nomen_tree('my_db.nomenclature', 'my_db.nom_helper', 'parent_id');

For populating a closure table from existing hierarchy I use this stored procedure:

CREATE FUNCTION rebuild_tree(tbl_base text, tbl_closure text, fld_parent text) RETURNS void
    LANGUAGE plpgsql
    AS $
BEGIN
    EXECUTE 'TRUNCATE ' || tbl_closure || ';
    INSERT INTO ' || tbl_closure || ' (child_id,ancestor_id,depth) 
        WITH RECURSIVE tree AS
      (
        SELECT id AS child_id,id AS ancestor_id,0 AS depth FROM ' || tbl_base || '
        UNION ALL 
        SELECT t.id,ancestor_id,depth+1 FROM ' || tbl_base || ' AS t
        JOIN tree ON child_id = ' || fld_parent || '
      )
      SELECT * FROM tree;';
END;
$;

Closure tables are defined with 3 columns - ANCESTOR_ID, DESCENDANT_ID, DEPTH. It is possible (and I even advice) to store records with same value for ANCESTOR and DESCENDANT, and a value of zero for DEPTH. This will simplify the queries for retrieval of the hierarchy. And they are very simple indeed:

-- get all descendants
SELECT tbl_orig.*,depth FROM tbl_closure LEFT JOIN tbl_orig ON descendant_id = tbl_orig.id WHERE ancestor_id = XXX AND depth <> 0;
-- get only direct descendants
SELECT tbl_orig.* FROM tbl_closure LEFT JOIN tbl_orig ON descendant_id = tbl_orig.id WHERE ancestor_id = XXX AND depth = 1;
-- get all ancestors
SELECT tbl_orig.* FROM tbl_closure LEFT JOIN tbl_orig ON ancestor_id = tbl_orig.id WHERE descendant_id = XXX AND depth <> 0;
-- find the deepest level of children
SELECT MAX(depth) FROM tbl_closure WHERE ancestor_id = XXX;

回复收藏 0 原文