如何用SQL表示数据树?

发布于 2024-08-20 00:28:02 字数 299 浏览 4 评论 0原文

我正在编写一个由 Tree 和 TreeNode 组合而成的数据树结构。树将包含数据的根和顶层操作。 我正在使用 UI 库以窗口形式呈现树,在其中我可以将树绑定到 TreeView。

我需要将这棵树和节点保存在数据库中。 保存树并获得以下功能的最佳方法是什么:

  1. 直观的实现。
  2. 轻松装订。从树到数据库结构和返回(如果有的话)很容易

我有两个想法。第一个是将数据序列化到表中的一个行中。 第二种是保存在表中,但是当移动到数据实体时,我将丢失更改节点上表上的行状态。

有什么想法吗?

I'm writing a data tree structure that is combined from a Tree and a TreeNode. Tree will contain the root and the top level actions on the data.
I'm using a UI library to present the tree in a windows form where I can bind the tree to the TreeView.

I will need to save this tree and nodes in the DB.
What will be the best way to save the tree and to get the following features:

  1. Intuitive implementation.
  2. Easy binding. Will be easy to move from the tree to the DB structure and back (if any)

I had 2 ideas. The first is to serialize the data into a one liner in a table.
The second is to save in tables but then, when moving to data entities I will loose the row states on the table on changed nodes.

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

秉烛思 2024-08-27 00:28:02

我已经为这个关于 SQL-Antipatterns 的 slidshare 添加了书签,其中讨论了几种替代方案: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed

那里的建议是使用闭包表(幻灯片中对此进行了解释)。

摘要如下(幻灯片 77):

                  | Query Child | Query Subtree | Modify Tree | Ref. Integrity
Adjacency List    |    Easy     |     Hard      |    Easy     |      Yes
Path Enumeration  |    Easy     |     Easy      |    Hard     |      No
Nested Sets       |    Hard     |     Easy      |    Hard     |      No
Closure Table     |    Easy     |     Easy      |    Easy     |      Yes

I've bookmarked this slidshare about SQL-Antipatterns, which discusses several alternatives: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed

The recommendation from there is to use a Closure Table (it's explained in the slides).

Here is the summary (slide 77):

                  | Query Child | Query Subtree | Modify Tree | Ref. Integrity
Adjacency List    |    Easy     |     Hard      |    Easy     |      Yes
Path Enumeration  |    Easy     |     Easy      |    Hard     |      No
Nested Sets       |    Hard     |     Easy      |    Hard     |      No
Closure Table     |    Easy     |     Easy      |    Easy     |      Yes
记忆で 2024-08-27 00:28:02

最简单的实现是邻接表结构:

id  parent_id  data

但是,某些数据库,特别是MySQL,在处理此模型时存在一些问题,因为它需要运行递归查询的能力,>MySQL 缺乏。

另一种模型是嵌套集

id lft rgt data

其中 lftrgt 是定义层次结构的任意值(任何子级的 lftrgt 应该在任何父级的 lft 内,rgt

这不需要递归查询,但它更慢且更难以维护。

然而,在MySQL中,这可以使用SPATIAL能力来改进。

请参阅我的博客中的这些文章:

了解更详细的说明。

The easiest implementation is adjacency list structure:

id  parent_id  data

However, some databases, particularly MySQL, have some issues in handling this model, because it requires an ability to run recursive queries which MySQL lacks.

Another model is nested sets:

id lft rgt data

where lft and rgt are arbitrary values that define the hierarchy (any child's lft, rgt should be within any parent's lft, rgt)

This does not require recursive queries, but it slower and harder to maintain.

However, in MySQL this can be improved using SPATIAL abitilies.

See these articles in my blog:

for more detailed explanations.

把人绕傻吧 2024-08-27 00:28:02

我很惊讶没有人提到物化路径解决方案,这可能是在标准 SQL 中处理树的最快方法。

在这种方法中,树中的每个节点都有一个列path,其中存储从根到节点的完整路径。这涉及非常简单且快速的查询。

看一下示例表 node

+---------+-------+
| node_id | path  |
+---------+-------+
| 0       |       |
| 1       | 1     |
| 2       | 2     |
| 3       | 3     |
| 4       | 1.4   |
| 5       | 2.5   |
| 6       | 2.6   |
| 7       | 2.6.7 |
| 8       | 2.6.8 |
| 9       | 2.6.9 |
+---------+-------+

为了获取节点 x 的子节点,您可以编写以下查询:

SELECT * FROM node WHERE path LIKE CONCAT((SELECT path FROM node WHERE node_id = x), '.%')

请记住,列 路径应该被索引,以便使用LIKE子句快速执行。

I'm suprised that nobody mentioned the materialized path solution, which is probably the fastest way of working with trees in standard SQL.

In this approach, every node in the tree has a column path, where the full path from the root to the node is stored. This involves very simple and fast queries.

Have a look at the example table node:

+---------+-------+
| node_id | path  |
+---------+-------+
| 0       |       |
| 1       | 1     |
| 2       | 2     |
| 3       | 3     |
| 4       | 1.4   |
| 5       | 2.5   |
| 6       | 2.6   |
| 7       | 2.6.7 |
| 8       | 2.6.8 |
| 9       | 2.6.9 |
+---------+-------+

In order to get the children of node x, you can write the following query:

SELECT * FROM node WHERE path LIKE CONCAT((SELECT path FROM node WHERE node_id = x), '.%')

Keep in mind, that the column path should be indexed, in order to perform fast with the LIKE clause.

汹涌人海 2024-08-27 00:28:02

如果您使用 PostgreSQL,则可以使用 ltree,它是 contrib 扩展中的一个包(默认情况下),它实现了树数据结构。

docs 中:

CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING GIST (path);
CREATE INDEX path_idx ON test USING BTREE (path);

您可以执行如下查询:

ltreetest=> SELECT path FROM test WHERE path <@ 'Top.Science';
                path
------------------------------------
 Top.Science
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology
(4 rows)

If you are using PostgreSQL you can use ltree, a package in the contrib extension (comes by default) which implements the tree data structure.

From the docs:

CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING GIST (path);
CREATE INDEX path_idx ON test USING BTREE (path);

You can do queries like:

ltreetest=> SELECT path FROM test WHERE path <@ 'Top.Science';
                path
------------------------------------
 Top.Science
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology
(4 rows)
恏ㄋ傷疤忘ㄋ疼 2024-08-27 00:28:02

这取决于您将如何查询和更新数据。如果将所有数据存储在一行中,那么它基本上是一个单元,如果不重写所有数据,则无法查询或部分更新。

如果要将每个元素存储为一行,您应该首先阅读 管理层次结构MySQL 中的数据(特定于 MySQL,但该建议也适用于许多其他数据库)。

如果您只访问整棵树,则邻接列表模型使得在不使用递归查询的情况下检索根下的所有节点变得困难。如果您添加链接回头部的额外列,那么您可以执行 SELECT * WHERE head_id = @id 并在一个非递归查询中获取整个树,但这会使数据库非规范化。

某些数据库具有自定义扩展,可以更轻松地存储和检索层次结构数据,例如 Oracle 有 连接方式

It depends on how you will be querying and updating the data. If you store all the data in one row, it's basically a single unit that you can't query into or partially update without rewriting all the data.

If you want to store each element as a row, you should first read Managing Hierarchical Data in MySQL (MySQL specific, but the advice holds for many other databases too).

If you're only ever accessing an entire tree, the adjacency list model makes it difficult to retrieve all nodes under the root without using a recursive query. If you add an extra column that links back to the head then you can do SELECT * WHERE head_id = @id and get the whole tree in one non-recursive query, but it denormalizes the database.

Some databases have custom extensions that make storing and retrieving heirarchical data easier, for example Oracle has CONNECT BY.

末蓝 2024-08-27 00:28:02

由于这是在谷歌搜索中询问“sql trees”时的最佳答案,我将尝试从今天(2018 年 12 月)的角度更新这一点。

大多数答案都暗示使用邻接列表既简单又缓慢,因此建议使用其他方法。

自版本 8(2018 年 4 月发布)起,MySQL 支持递归公用表表达式 (CTE)。 MySQL 的出现有点晚了,但这开辟了一个新的选择。

此处有一个教程,解释了如何使用递归查询来管理邻接关系列表。

由于递归现在完全在数据库引擎中运行,因此它比过去(必须在脚本引擎中运行时)快得多。

博客此处给出了一些测量结果(其中都是有偏见的,而且是针对 postgres 而不是 MySQL),但尽管如此,它表明邻接表不必很慢。

所以我今天的结论是:

  • 如果数据库引擎支持递归,简单的邻接表可能足够快。
  • 使用您自己的数据和您自己的引擎进行基准测试。
  • 不要相信过时的建议来指出“最佳”方法。

As this is the top answer when asking "sql trees" in a google search, I will try to update this from the perspective of today (december 2018).

Most answers imply that using an adjacency list is both simple and slow and therefore recommend other methods.

Since version 8 (published april 2018) MySQL supports recursive common table expressions (CTE). MySQL is a bit late to the show but this opens up a new option.

There is a tutorial here that explains the use of recursive queries to manage an adjacency list.

As the recursion now runs completely within the database engine, it is way much faster than in the past (when it had to run in the script engine).

The blog here gives some measurements (which are both biased and for postgres instead of MySQL) but nevertheless it shows that adjacency lists do not have to be slow.

So my conclusion today is:

  • The simple adjacency list may be fast enough if the database engine supports recursion.
  • Do a benchmark with your own data and your own engine.
  • Do not trust outdated recommendations to point out the "best" method.
萌面超妹 2024-08-27 00:28:02

PGSQL 树关系

你好,我刚刚为我正在从事的一个项目处理了这个问题,并想分享我的文章
希望这有帮助。让我们从一些先决条件开始,

这本质上是上面提到的使用递归调用的闭包表解决方案。感谢这些幻灯片,它们非常有用,我希望在写这篇文章之前看到它们:)

先决条件

递归函数

这些是自称为 ie 的函数,

function factorial(n) {
    if (n = 0) return 1; //base case
    return n * factorial(n - 1); // recursive call
}

这很酷,幸运的是 pgsql 也有递归函数,但它可能有点多。我更喜欢功能性的东西
cte with pgsql

WITH RECURSIVE t(n) AS (
    VALUES (1) -- nonrecusive term 
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100 -- recusive term
    --continues until union adds nothing
)
SELECT sum(n) FROM t;

递归WITH查询的一般形式始终是非递归术语,然后是 UNION(或 UNION ALL),然后是递归术语,其中只有递归术语可以包含对查询自身输出的引用。这样的查询执行如下:

递归查询评估

  1. 评估非递归项。对于 UNION(但不是 UNION ALL),丢弃重复的行。将所有剩余行包含在递归查询的结果中,并将它们放入临时工作表中。

  2. 只要工作表不为空,就重复这些步骤:

    a.评估递归项,用工作表的当前内容替换递归自引用。对于 UNION(但不是 UNION ALL),丢弃重复行以及与任何先前结果行重复的行。将所有剩余行包含在递归查询的结果中,并将它们放入临时中间表中。

    b.将工作表的内容替换为中间表的内容,然后清空中间表。

要在sql中执行阶乘之类的操作,您需要执行类似 这篇文章

ALTER FUNCTION dbo.fnGetFactorial (@num int)
RETURNS INT
AS
BEGIN
    DECLARE @n  int

    IF @num <= 1 SET @n = 1
    ELSE SET @n = @num * dbo.fnGetFactorial(@num - 1)

    RETURN @n
END
GO

树数据结构(更像是森林:)

维基百科

需要注意的重要一点是,树是 graph,这可以简单地通过以下方式强制执行
每个节点只有一个父节点的关系。

在 PGSQL 中表示树

我认为在我们继续讨论 sql 之前,从理论上讲是最容易的。在

没有数据重复的情况下表示图形关系的简单方法是分离节点(id,数据) 从边缘。
然后,我们可以限制 edges(parent_id, child_id) 表来强制执行我们的约束。强制规定parent_id,child_id
以及子 ID 都是唯一的。

create table nodes (
    id uuid default uuid_generate_v4() not null unique ,
    name varchar(255) not null,
    json json default '{}'::json not null,
    remarks varchar(255),
);


create table edges (
    id uuid default uuid_generate_v4() not null,
    parent_id uuid not null,
    child_id uuid not null,
    meta json default '{}'::json,
    constraint group_group_id_key
        primary key (id),
    constraint group_group_unique_combo
        unique (parent_id, child_id),
    constraint group_group_unique_child
        unique (child_id),
    foreign key (parent_id) references nodes
        on update cascade on delete cascade,
    foreign key (child_id) references nodes
        on update cascade on delete cascade
);

请注意,理论上,只需将 Parent_id 放入节点表中,只需一张表即可完成这一切
然后,

CREATE VIEW v_edges as (SELECT id as child_id, parent_id FROM nodes)

但是为了灵活性的提议,以便我们可以将其他图形结构合并到此中
框架我将使用常见的多对多关系结构。理想情况下,这将使这项研究能够
扩展到其他图算法。

让我们从一个示例数据结构开始,

INSERT (id, my_data) VALUES ('alpha', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('bravo', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('charly', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('berry', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('zeta', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('yank', 'my big data') INTO nodes

INSERT (parent_id, child_id) VALUES ('alpha', 'bravo') INTO edges
INSERT (parent_id, child_id) VALUES ('alpha', 'berry') INTO edges
INSERT (parent_id, child_id) VALUES ('bravo', 'charly') INTO edges
INSERT (parent_id, child_id) VALUES ('yank', 'zeta') INTO edges

-- rank0       Alpha      Yank
-- rank1    Bravo Berry     Zeta
-- rank2  Charly         

注意树的有趣属性(边数 e)=(节点数 n)-1
每个孩子只有一个父母。

然后我们可以简化方程

let n = node
let p = parent 
let c = child
let ns = nodes = groups
let es = edges = group_group // because this is a relationship of a group entity to another group entity

那么现在我们会问什么样的问题。

“给定一组任意组,假设节点继承其子节点,图的覆盖范围是多少?”

这是一个棘手的问题,它要求我们遍历图表并找到 s 中每个节点的所有子节点

这继续此 堆栈溢出帖子

        -- some DBMS (e.g. Postgres) require the word "recursive"
        -- some others (Oracle, SQL-Server) require omitting the "recursive"
        -- and some (e.g. SQLite) don't bother, i.e. they accept both
-- drop view v_group_descendant;
create view v_group_descendant as
with recursive descendants -- name for accumulating table
  (parent_id, descendant_id, lvl) -- output columns
as
  ( select parent_id, child_id, 1
    from group_group -- starting point, we start with each base group
  union all
    select d.parent_id, s.child_id, d.lvl + 1
    from descendants  d -- get the n-1 th level of descendants/ children
      join group_group  s -- and join it to find the nth level
        on d.descendant_id = s.parent_id -- the trick is that the output of this query becomes the input
        -- Im not sure when it stops but probably when there is no change
  )
select * from descendants;

comment on view v_group_descendant is 'This aggregates the children of each group RECURSIVELY WOO ALL THE WAY DOWN THE TREE :)';

在我们有了这个视图之后,我们可以加入我们的节点/组来获取数据,我不会为每个步骤提供这些样本,大多数情况下我们只会使用 ids。

select d.*, g1.group_name as parent, g2.group_name as decendent --then we join it with groups to add names
from v_group_descendant d, groups g1, groups g2
WHERE g1.id = d.parent_id and g2.id = d.descendant_id
order by parent_id, lvl, descendant_id;

示例输出

+------------------------------------+------------------------------------+---+----------+---------+
|parent_id                           |descendant_id                       |lvl|parent    |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1  |bravo     |charly   |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1  |alpha     |bravo    |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1  |alpha     |berry    |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2  |alpha     |charly   |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1  |yank      |zeta     |
+------------------------------------+------------------------------------+---+----------+---------+

请注意,这只是最小的节点后代关系,并且实际上丢失了具有 0 个子节点的所有节点,例如 charly。

为了解决这个问题,我们需要添加所有未出现在后代列表中的节点

 create view v_group_descendant_all as (
       select * from  v_group_descendant gd
       UNION ALL
       select  null::uuid as parent_id,id as descendant_id, 0 as lvl from groups g
       where not exists (select * from  v_group_descendant gd where gd.descendant_id = g.id )
);
comment on view v_group_descendant is 'complete list of descendants including rank 0 root nodes descendant - parent relationship is duplicated for all levels / ranks';
preview
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id                           |descendant_id                       |lvl|parent    |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1  |bravo     |charly   |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1  |alpha     |bravo    |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1  |alpha     |berry    |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2  |alpha     |charly   |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1  |yank      |zeta     |
|null                                |c1529e8a-75b0-4242-a51a-ac60a0e48868|0  |null      |alpha    |
|null                                |42529e8a-75b0-4242-a51a-ac60a0e48868|0  |null      |yank     |
+------------------------------------+------------------------------------+---+----------+---------+

举例来说,我们正在基于 users(id , data) 表获取组集具有 user_group(user_id, group_id) 关系,

然后我们可以将其连接到另一个表,删除重复项,因为我们的 user_group 关系集 s 可能会导致
如果一个用户被分配给两个 alpha 分配的 charly,则重复

+------+--------+
| user | group  |
+------+--------+
| jane | alpha  |
| jane | charly |
| kier | yank   |   
| kier | bravo  |
+------+--------+
--drop view v_user_group_recursive;
CREATE VIEW v_user_group_recursive AS (
SELECT DISTINCT dd.descendant_id AS group_id, ug.user_id 
    FROM v_group_descendant_all dd , user_group ug
    WHERE (ug.group_id = dd.descendant_id 
        OR ug.group_id = dd.parent_id)  -- should gic
);
SELECT * FROM v_user_group_recursive;
+------+--------+
| user | group  |
+------+--------+
| jane | alpha  |
| jane | bravo  |
| jane | berry  |
| jane | charly |
-- | jane | charly | Removed by DISTINCT
| kier | yank   |   
| kier | zeta   |   
| kier | bravo  |
| kier | charly |
+------+--------+

如果我们想要,我们现在可以按节点分组并加入,我们可以做一些像下面这样的事情,

CREATE VIEW v_user_groups_recursive AS (
    SELECT user_id, json_agg(json_build_object('id', id,'parent_id',parent_id, 'group_name', group_name, 'org_id', org_id, 'json', json, 'remarks', remarks)) as groups
    FROM v_user_group_recursive ug, v_groups_parent g
    WHERE ug.group_id = g.id GROUP BY user_id
);
comment on view v_user_group_recursive is 'This aggregates the groups for each user recursively ';
+------+-------------------------------+
| user | groups                        |
+------+-------------------------------+
| jane | [alpha, bravo, berry, charly] |
| kier | [yank, zeta, bravo, charly]   |   
+------+-------------------------------+

这太棒了,我们已经回答了这个问题。我们现在可以简单地询问这个使用继承了哪些组

SELECT * from v_user_groups_recursive where user_id = 'kier

在前端显示我们的辛苦工作

并且进一步我们可以使用像 jstree.com 这样的东西来显示
我们的结构

  async function getProjectTree(user_id) {
        let res = await table.query(format('SELECT * from v_user_groups_recursive ug WHERE ug.user_id = %L', user_id));
        if (res.success) {
            let rows = res.data[0].groups.map(r => {

                return {
                    id: r.id, // required
                    parent: r.parent_id==null?'#':r.parent_id,// required
                    text: r.group_name,// node text
                    icon: 'P', // string for custom
                    state: {
                        opened: true,  // is the node open
                        disabled: false,  // is the node disabled
                        selected: false,  // is the node selected
                    },
                    li_attr: {},  // attributes for the generated LI node
                    a_attr: {}  // attributes for the generated A node
                }
            })
           
            return {success: true, data: rows, msg: 'Got all projects'}
        } else return res;
    }
<div id="v_project_tree" class="row col-10 mx-auto" style="height: 25vh"></div>
<script>
   function buildTree() {
      bs.sendJson('get', "/api/projects/getProjectTree").then(res => {
         bs.resNotify(res);
         if (!res.success) {
            //:(
            console.error(':(');
            return
         }
         console.log(res.data);
         $('#v_project_tree').jstree({
            'core': {
               'data': res.data
            }
         });
      })
   }
   window.addEventListener('load', buildTree);
</script>

jstree 预览

博客

PGSQL Tree relations

Hello, I just got a handle on this for a project I'm working on and figured I'd share my write-up
Hope this helps. Let's get started with some prereqs

This is essentially the closure table solution mentioned above Using recursive calls. Thanks for those slides they are very useful I wish i saw them before this write up :)

pre-requisites

Recursive Functions

these are functions that call themselves ie

function factorial(n) {
    if (n = 0) return 1; //base case
    return n * factorial(n - 1); // recursive call
}

This is pretty cool luckily pgsql has recursive functions too but it can be a bit much. I prefer functional stuff
cte with pgsql

WITH RECURSIVE t(n) AS (
    VALUES (1) -- nonrecusive term 
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100 -- recusive term
    --continues until union adds nothing
)
SELECT sum(n) FROM t;

The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL), then a recursive term, where only the recursive term can contain a reference to the query's own output. Such a query is executed as follows:

Recursive Query Evaluation

  1. Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.

  2. So long as the working table is not empty, repeat these steps:

    a. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.

    b. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.

to do something like factorial in sql you need to do something more like this so post

ALTER FUNCTION dbo.fnGetFactorial (@num int)
RETURNS INT
AS
BEGIN
    DECLARE @n  int

    IF @num <= 1 SET @n = 1
    ELSE SET @n = @num * dbo.fnGetFactorial(@num - 1)

    RETURN @n
END
GO

Tree data structures (more of a forest :)

wikipedia

The import thing to note is that a tree is a subset of a graph, This can be simply enforced by
the relationship each node has only one parent.

Representing the Tree in PGSQL

I think it will be easiest to work it out a little more theoretically before we move on to the sql

The simple way of represent a graph relation without data duplication is by separating the nodes(id, data) from the edges.
We can then restrict the edges(parent_id, child_id) table to enforce our constraint. be mandating that parent_id,child_id
as well as just child id be unique

create table nodes (
    id uuid default uuid_generate_v4() not null unique ,
    name varchar(255) not null,
    json json default '{}'::json not null,
    remarks varchar(255),
);


create table edges (
    id uuid default uuid_generate_v4() not null,
    parent_id uuid not null,
    child_id uuid not null,
    meta json default '{}'::json,
    constraint group_group_id_key
        primary key (id),
    constraint group_group_unique_combo
        unique (parent_id, child_id),
    constraint group_group_unique_child
        unique (child_id),
    foreign key (parent_id) references nodes
        on update cascade on delete cascade,
    foreign key (child_id) references nodes
        on update cascade on delete cascade
);

Note that theoretical this can all be done with only one table by simply putting the parent_id in the nodes table
and then

CREATE VIEW v_edges as (SELECT id as child_id, parent_id FROM nodes)

but for the proposal of flexibility and so that we can incorporate other graph structures to this
framework I will use the common many-to-many relationship structure. This will ideally allow this research to be
expanded into other graph algorithms.

Let's start out with a sample data structure

INSERT (id, my_data) VALUES ('alpha', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('bravo', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('charly', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('berry', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('zeta', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('yank', 'my big data') INTO nodes

INSERT (parent_id, child_id) VALUES ('alpha', 'bravo') INTO edges
INSERT (parent_id, child_id) VALUES ('alpha', 'berry') INTO edges
INSERT (parent_id, child_id) VALUES ('bravo', 'charly') INTO edges
INSERT (parent_id, child_id) VALUES ('yank', 'zeta') INTO edges

-- rank0       Alpha      Yank
-- rank1    Bravo Berry     Zeta
-- rank2  Charly         

Note the interesting properties of a tree (number of edges e) =( number of nodes n)-1
each child has exactly one parent.

We can then simplify the equations

let n = node
let p = parent 
let c = child
let ns = nodes = groups
let es = edges = group_group // because this is a relationship of a group entity to another group entity

So now what sort of questions will we ask.

"Given an arbitrary set of groups 's' what is the coverage of the graph assuming nodes inherit their children?"

This is a tricky question, it requires us to traverse the graph and find all children of each node in s

This continues off of this stack overflow post

        -- some DBMS (e.g. Postgres) require the word "recursive"
        -- some others (Oracle, SQL-Server) require omitting the "recursive"
        -- and some (e.g. SQLite) don't bother, i.e. they accept both
-- drop view v_group_descendant;
create view v_group_descendant as
with recursive descendants -- name for accumulating table
  (parent_id, descendant_id, lvl) -- output columns
as
  ( select parent_id, child_id, 1
    from group_group -- starting point, we start with each base group
  union all
    select d.parent_id, s.child_id, d.lvl + 1
    from descendants  d -- get the n-1 th level of descendants/ children
      join group_group  s -- and join it to find the nth level
        on d.descendant_id = s.parent_id -- the trick is that the output of this query becomes the input
        -- Im not sure when it stops but probably when there is no change
  )
select * from descendants;

comment on view v_group_descendant is 'This aggregates the children of each group RECURSIVELY WOO ALL THE WAY DOWN THE TREE :)';

after we have this view we can join with our nodes/groups to get out data back i will not provide these samples for every single step for the most part we will just work with ids.

select d.*, g1.group_name as parent, g2.group_name as decendent --then we join it with groups to add names
from v_group_descendant d, groups g1, groups g2
WHERE g1.id = d.parent_id and g2.id = d.descendant_id
order by parent_id, lvl, descendant_id;

sample output

+------------------------------------+------------------------------------+---+----------+---------+
|parent_id                           |descendant_id                       |lvl|parent    |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1  |bravo     |charly   |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1  |alpha     |bravo    |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1  |alpha     |berry    |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2  |alpha     |charly   |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1  |yank      |zeta     |
+------------------------------------+------------------------------------+---+----------+---------+

Note that this is just the minimal node descendant relationship and has actual lost all nodes with 0 children such as charly.

In order to resolve this we need to add all nodes back which don't appear in the descendants list

 create view v_group_descendant_all as (
       select * from  v_group_descendant gd
       UNION ALL
       select  null::uuid as parent_id,id as descendant_id, 0 as lvl from groups g
       where not exists (select * from  v_group_descendant gd where gd.descendant_id = g.id )
);
comment on view v_group_descendant is 'complete list of descendants including rank 0 root nodes descendant - parent relationship is duplicated for all levels / ranks';
preview
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id                           |descendant_id                       |lvl|parent    |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1  |bravo     |charly   |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1  |alpha     |bravo    |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1  |alpha     |berry    |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2  |alpha     |charly   |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1  |yank      |zeta     |
|null                                |c1529e8a-75b0-4242-a51a-ac60a0e48868|0  |null      |alpha    |
|null                                |42529e8a-75b0-4242-a51a-ac60a0e48868|0  |null      |yank     |
+------------------------------------+------------------------------------+---+----------+---------+

Lets say for example we are getting our set s of groups bases on a users(id , data) table with a user_group(user_id, group_id) relation

We can then join this to another table removing duplicates because our set s of user_group relations may cause
duplicates if a users is say assigned to both alpha assigned charly

+------+--------+
| user | group  |
+------+--------+
| jane | alpha  |
| jane | charly |
| kier | yank   |   
| kier | bravo  |
+------+--------+
--drop view v_user_group_recursive;
CREATE VIEW v_user_group_recursive AS (
SELECT DISTINCT dd.descendant_id AS group_id, ug.user_id 
    FROM v_group_descendant_all dd , user_group ug
    WHERE (ug.group_id = dd.descendant_id 
        OR ug.group_id = dd.parent_id)  -- should gic
);
SELECT * FROM v_user_group_recursive;
+------+--------+
| user | group  |
+------+--------+
| jane | alpha  |
| jane | bravo  |
| jane | berry  |
| jane | charly |
-- | jane | charly | Removed by DISTINCT
| kier | yank   |   
| kier | zeta   |   
| kier | bravo  |
| kier | charly |
+------+--------+

If we want we can now group by node and join we can do somthing k like the fallowing

CREATE VIEW v_user_groups_recursive AS (
    SELECT user_id, json_agg(json_build_object('id', id,'parent_id',parent_id, 'group_name', group_name, 'org_id', org_id, 'json', json, 'remarks', remarks)) as groups
    FROM v_user_group_recursive ug, v_groups_parent g
    WHERE ug.group_id = g.id GROUP BY user_id
);
comment on view v_user_group_recursive is 'This aggregates the groups for each user recursively ';
+------+-------------------------------+
| user | groups                        |
+------+-------------------------------+
| jane | [alpha, bravo, berry, charly] |
| kier | [yank, zeta, bravo, charly]   |   
+------+-------------------------------+

This is awesome we have answered the question. We now can simply ask which groups this use inherits

SELECT * from v_user_groups_recursive where user_id = 'kier

Displaying our hard work in the front end

And further we could use somthing like jstree.com to display
our structure

  async function getProjectTree(user_id) {
        let res = await table.query(format('SELECT * from v_user_groups_recursive ug WHERE ug.user_id = %L', user_id));
        if (res.success) {
            let rows = res.data[0].groups.map(r => {

                return {
                    id: r.id, // required
                    parent: r.parent_id==null?'#':r.parent_id,// required
                    text: r.group_name,// node text
                    icon: 'P', // string for custom
                    state: {
                        opened: true,  // is the node open
                        disabled: false,  // is the node disabled
                        selected: false,  // is the node selected
                    },
                    li_attr: {},  // attributes for the generated LI node
                    a_attr: {}  // attributes for the generated A node
                }
            })
           
            return {success: true, data: rows, msg: 'Got all projects'}
        } else return res;
    }
<div id="v_project_tree" class="row col-10 mx-auto" style="height: 25vh"></div>
<script>
   function buildTree() {
      bs.sendJson('get', "/api/projects/getProjectTree").then(res => {
         bs.resNotify(res);
         if (!res.success) {
            //:(
            console.error(':(');
            return
         }
         console.log(res.data);
         $('#v_project_tree').jstree({
            'core': {
               'data': res.data
            }
         });
      })
   }
   window.addEventListener('load', buildTree);
</script>

jstree preview

blog

安人多梦 2024-08-27 00:28:02

我认为最好的方法确实是给每个节点一个id和一个parent_id,其中parent id是父节点的id。这有几个好处:

  1. 当您想要更新节点时,只需重写该节点的数据即可。
  2. 当你只想查询某个节点时,你可以准确地得到你想要的信息,从而减少数据库连接的开销
  3. 很多编程语言都有将mysql数据转换为XML或json的功能,这将更容易打开使用 api 启动您的应用程序。

The best way, I think indeed is to give each node an id and a parent_id, where the parent id is the id of the parent node. This has a couple of benefits

  1. When you want to update a node, you only have to rewrite the data of that node.
  2. When you want to query only a certain node, you can get exactly the information you want, thus having less overhead on the database connection
  3. A lot of programming languages have functionality to transform mysql data into XML or json, which will make it easier to open up your application using an api.
維他命╮ 2024-08-27 00:28:02

类似于表“节点”,其中每个节点行都包含父 ID(除了普通节点数据之外)。对于 root,父级为 NULL。

当然,这使得查找子项变得更加耗时,但这样实际的数据库将非常简单。

Something like table "nodes" where each node row contains parent id (in addition to the ordinary node data). For root, the parent is NULL.

Of course, this makes finding children a bit more time consuming, but this way the actual database will be quite simple.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文