如何用SQL表示数据树?
我正在编写一个由 Tree 和 TreeNode 组合而成的数据树结构。树将包含数据的根和顶层操作。 我正在使用 UI 库以窗口形式呈现树,在其中我可以将树绑定到 TreeView。
我需要将这棵树和节点保存在数据库中。 保存树并获得以下功能的最佳方法是什么:
- 直观的实现。
- 轻松装订。从树到数据库结构和返回(如果有的话)很容易
我有两个想法。第一个是将数据序列化到表中的一个行中。 第二种是保存在表中,但是当移动到数据实体时,我将丢失更改节点上表上的行状态。
有什么想法吗?
I'm writing a data tree structure that is combined from a Tree and a TreeNode. Tree will contain the root and the top level actions on the data.
I'm using a UI library to present the tree in a windows form where I can bind the tree to the TreeView.
I will need to save this tree and nodes in the DB.
What will be the best way to save the tree and to get the following features:
- Intuitive implementation.
- Easy binding. Will be easy to move from the tree to the DB structure and back (if any)
I had 2 ideas. The first is to serialize the data into a one liner in a table.
The second is to save in tables but then, when moving to data entities I will loose the row states on the table on changed nodes.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
我已经为这个关于 SQL-Antipatterns 的 slidshare 添加了书签,其中讨论了几种替代方案: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed
那里的建议是使用闭包表(幻灯片中对此进行了解释)。
摘要如下(幻灯片 77):
I've bookmarked this slidshare about SQL-Antipatterns, which discusses several alternatives: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed
The recommendation from there is to use a Closure Table (it's explained in the slides).
Here is the summary (slide 77):
最简单的实现是邻接表结构:
但是,某些数据库,特别是
MySQL
,在处理此模型时存在一些问题,因为它需要运行递归查询的能力,>MySQL
缺乏。另一种模型是嵌套集:
其中
lft
和rgt
是定义层次结构的任意值(任何子级的lft
,rgt
应该在任何父级的lft
内,rgt
)这不需要递归查询,但它更慢且更难以维护。
然而,在
MySQL
中,这可以使用SPATIAL
能力来改进。请参阅我的博客中的这些文章:
了解更详细的说明。
The easiest implementation is adjacency list structure:
However, some databases, particularly
MySQL
, have some issues in handling this model, because it requires an ability to run recursive queries whichMySQL
lacks.Another model is nested sets:
where
lft
andrgt
are arbitrary values that define the hierarchy (any child'slft
,rgt
should be within any parent'slft
,rgt
)This does not require recursive queries, but it slower and harder to maintain.
However, in
MySQL
this can be improved usingSPATIAL
abitilies.See these articles in my blog:
for more detailed explanations.
我很惊讶没有人提到物化路径解决方案,这可能是在标准 SQL 中处理树的最快方法。
在这种方法中,树中的每个节点都有一个列path,其中存储从根到节点的完整路径。这涉及非常简单且快速的查询。
看一下示例表 node:
为了获取节点 x 的子节点,您可以编写以下查询:
请记住,列 路径应该被索引,以便使用LIKE子句快速执行。
I'm suprised that nobody mentioned the materialized path solution, which is probably the fastest way of working with trees in standard SQL.
In this approach, every node in the tree has a column path, where the full path from the root to the node is stored. This involves very simple and fast queries.
Have a look at the example table node:
In order to get the children of node x, you can write the following query:
Keep in mind, that the column path should be indexed, in order to perform fast with the LIKE clause.
如果您使用 PostgreSQL,则可以使用
ltree
,它是 contrib 扩展中的一个包(默认情况下),它实现了树数据结构。从 docs 中:
您可以执行如下查询:
If you are using PostgreSQL you can use
ltree
, a package in the contrib extension (comes by default) which implements the tree data structure.From the docs:
You can do queries like:
这取决于您将如何查询和更新数据。如果将所有数据存储在一行中,那么它基本上是一个单元,如果不重写所有数据,则无法查询或部分更新。
如果要将每个元素存储为一行,您应该首先阅读 管理层次结构MySQL 中的数据(特定于 MySQL,但该建议也适用于许多其他数据库)。
如果您只访问整棵树,则邻接列表模型使得在不使用递归查询的情况下检索根下的所有节点变得困难。如果您添加链接回头部的额外列,那么您可以执行 SELECT * WHERE head_id = @id 并在一个非递归查询中获取整个树,但这会使数据库非规范化。
某些数据库具有自定义扩展,可以更轻松地存储和检索层次结构数据,例如 Oracle 有 连接方式。
It depends on how you will be querying and updating the data. If you store all the data in one row, it's basically a single unit that you can't query into or partially update without rewriting all the data.
If you want to store each element as a row, you should first read Managing Hierarchical Data in MySQL (MySQL specific, but the advice holds for many other databases too).
If you're only ever accessing an entire tree, the adjacency list model makes it difficult to retrieve all nodes under the root without using a recursive query. If you add an extra column that links back to the head then you can do
SELECT * WHERE head_id = @id
and get the whole tree in one non-recursive query, but it denormalizes the database.Some databases have custom extensions that make storing and retrieving heirarchical data easier, for example Oracle has CONNECT BY.
由于这是在谷歌搜索中询问“sql trees”时的最佳答案,我将尝试从今天(2018 年 12 月)的角度更新这一点。
大多数答案都暗示使用邻接列表既简单又缓慢,因此建议使用其他方法。
自版本 8(2018 年 4 月发布)起,MySQL 支持递归公用表表达式 (CTE)。 MySQL 的出现有点晚了,但这开辟了一个新的选择。
此处有一个教程,解释了如何使用递归查询来管理邻接关系列表。
由于递归现在完全在数据库引擎中运行,因此它比过去(必须在脚本引擎中运行时)快得多。
博客此处给出了一些测量结果(其中都是有偏见的,而且是针对 postgres 而不是 MySQL),但尽管如此,它表明邻接表不必很慢。
所以我今天的结论是:
As this is the top answer when asking "sql trees" in a google search, I will try to update this from the perspective of today (december 2018).
Most answers imply that using an adjacency list is both simple and slow and therefore recommend other methods.
Since version 8 (published april 2018) MySQL supports recursive common table expressions (CTE). MySQL is a bit late to the show but this opens up a new option.
There is a tutorial here that explains the use of recursive queries to manage an adjacency list.
As the recursion now runs completely within the database engine, it is way much faster than in the past (when it had to run in the script engine).
The blog here gives some measurements (which are both biased and for postgres instead of MySQL) but nevertheless it shows that adjacency lists do not have to be slow.
So my conclusion today is:
PGSQL 树关系
你好,我刚刚为我正在从事的一个项目处理了这个问题,并想分享我的文章
希望这有帮助。让我们从一些先决条件开始,
这本质上是上面提到的使用递归调用的
闭包表
解决方案。感谢这些幻灯片,它们非常有用,我希望在写这篇文章之前看到它们:)先决条件
递归函数
这些是自称为 ie 的函数,
这很酷,幸运的是 pgsql 也有递归函数,但它可能有点多。我更喜欢功能性的东西
cte with pgsql
递归WITH查询的一般形式始终是非递归术语,然后是 UNION(或 UNION ALL),然后是递归术语,其中只有递归术语可以包含对查询自身输出的引用。这样的查询执行如下:
递归查询评估
评估非递归项。对于 UNION(但不是 UNION ALL),丢弃重复的行。将所有剩余行包含在递归查询的结果中,并将它们放入临时工作表中。
只要工作表不为空,就重复这些步骤:
a.评估递归项,用工作表的当前内容替换递归自引用。对于 UNION(但不是 UNION ALL),丢弃重复行以及与任何先前结果行重复的行。将所有剩余行包含在递归查询的结果中,并将它们放入临时中间表中。
b.将工作表的内容替换为中间表的内容,然后清空中间表。
要在sql中执行阶乘之类的操作,您需要执行类似 这篇文章
树数据结构(更像是森林:)
维基百科
需要注意的重要一点是,树是 graph,这可以简单地通过以下方式强制执行
每个节点只有一个父节点的关系。
在 PGSQL 中表示树
我认为在我们继续讨论 sql 之前,从理论上讲是最容易的。在
没有数据重复的情况下表示图形关系的简单方法是分离节点(id,数据) 从边缘。
然后,我们可以限制
edges(parent_id, child_id)
表来强制执行我们的约束。强制规定parent_id,child_id以及子 ID 都是唯一的。
请注意,理论上,只需将 Parent_id 放入节点表中,只需一张表即可完成这一切
然后,
但是为了灵活性的提议,以便我们可以将其他图形结构合并到此中
框架我将使用常见的多对多关系结构。理想情况下,这将使这项研究能够
扩展到其他图算法。
让我们从一个示例数据结构开始,
注意树的有趣属性
(边数 e)=(节点数 n)-1
每个孩子只有一个父母。
然后我们可以简化方程
那么现在我们会问什么样的问题。
“给定一组任意组,假设节点继承其子节点,图的覆盖范围是多少?”
这是一个棘手的问题,它要求我们遍历图表并找到 s 中每个节点的所有子节点
这继续此 堆栈溢出帖子
在我们有了这个视图之后,我们可以加入我们的节点/组来获取数据,我不会为每个步骤提供这些样本,大多数情况下我们只会使用 ids。
示例输出
请注意,这只是最小的节点后代关系,并且实际上丢失了具有 0 个子节点的所有节点,例如 charly。
为了解决这个问题,我们需要添加所有未出现在后代列表中的节点
举例来说,我们正在基于
users(id , data)
表获取组集具有user_group(user_id, group_id)
关系,然后我们可以将其连接到另一个表,删除重复项,因为我们的 user_group 关系集
s
可能会导致如果一个用户被分配给两个 alpha 分配的 charly,则重复
如果我们想要,我们现在可以按节点分组并加入,我们可以做一些像下面这样的事情,
这太棒了,我们已经回答了这个问题。我们现在可以简单地询问这个使用继承了哪些组
在前端显示我们的辛苦工作
并且进一步我们可以使用像 jstree.com 这样的东西来显示
我们的结构
jstree 预览
博客
PGSQL Tree relations
Hello, I just got a handle on this for a project I'm working on and figured I'd share my write-up
Hope this helps. Let's get started with some prereqs
This is essentially the
closure table
solution mentioned above Using recursive calls. Thanks for those slides they are very useful I wish i saw them before this write up :)pre-requisites
Recursive Functions
these are functions that call themselves ie
This is pretty cool luckily pgsql has recursive functions too but it can be a bit much. I prefer functional stuff
cte with pgsql
The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL), then a recursive term, where only the recursive term can contain a reference to the query's own output. Such a query is executed as follows:
Recursive Query Evaluation
Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
So long as the working table is not empty, repeat these steps:
a. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
b. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
to do something like factorial in sql you need to do something more like this so post
Tree data structures (more of a forest :)
wikipedia
The import thing to note is that a tree is a subset of a graph, This can be simply enforced by
the relationship each node has only one parent.
Representing the Tree in PGSQL
I think it will be easiest to work it out a little more theoretically before we move on to the sql
The simple way of represent a graph relation without data duplication is by separating the
nodes(id, data)
from the edges.We can then restrict the
edges(parent_id, child_id)
table to enforce our constraint. be mandating that parent_id,child_idas well as just child id be unique
Note that theoretical this can all be done with only one table by simply putting the parent_id in the nodes table
and then
but for the proposal of flexibility and so that we can incorporate other graph structures to this
framework I will use the common many-to-many relationship structure. This will ideally allow this research to be
expanded into other graph algorithms.
Let's start out with a sample data structure
Note the interesting properties of a tree
(number of edges e) =( number of nodes n)-1
each child has exactly one parent.
We can then simplify the equations
So now what sort of questions will we ask.
"Given an arbitrary set of groups 's' what is the coverage of the graph assuming nodes inherit their children?"
This is a tricky question, it requires us to traverse the graph and find all children of each node in s
This continues off of this stack overflow post
after we have this view we can join with our nodes/groups to get out data back i will not provide these samples for every single step for the most part we will just work with ids.
sample output
Note that this is just the minimal node descendant relationship and has actual lost all nodes with 0 children such as charly.
In order to resolve this we need to add all nodes back which don't appear in the descendants list
Lets say for example we are getting our set s of groups bases on a
users(id , data)
table with auser_group(user_id, group_id)
relationWe can then join this to another table removing duplicates because our set
s
of user_group relations may causeduplicates if a users is say assigned to both alpha assigned charly
If we want we can now group by node and join we can do somthing k like the fallowing
This is awesome we have answered the question. We now can simply ask which groups this use inherits
Displaying our hard work in the front end
And further we could use somthing like jstree.com to display
our structure
jstree preview
blog
我认为最好的方法确实是给每个节点一个id和一个parent_id,其中parent id是父节点的id。这有几个好处:
The best way, I think indeed is to give each node an id and a parent_id, where the parent id is the id of the parent node. This has a couple of benefits
类似于表“节点”,其中每个节点行都包含父 ID(除了普通节点数据之外)。对于 root,父级为 NULL。
当然,这使得查找子项变得更加耗时,但这样实际的数据库将非常简单。
Something like table "nodes" where each node row contains parent id (in addition to the ordinary node data). For root, the parent is NULL.
Of course, this makes finding children a bit more time consuming, but this way the actual database will be quite simple.