Good Overviews
Generally speaking, you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:
Options
Ones I am aware of and general features:
- Adjacency List:
- Columns: ID, ParentID
- Easy to implement.
- Cheap node moves, inserts, and deletes.
- Expensive to find the level, ancestry & descendants, path
- Avoid N+1 via Common Table Expressions in databases that support them
- Nested Set (a.k.a Modified Preorder Tree Traversal (MPTT))
- Columns: Left, Right
- Cheap ancestry, descendants
- Very expensive
O(n/2)
moves, inserts, deletes due to volatile encoding
- Bridge Table (a.k.a. Closure Table /w triggers)
- Uses separate join table with ancestor, descendant, depth (optional)
- Cheap ancestry and descendants
- Writes costs
O(log n)
(size of the subtree) for insert, updates, deletes
- Normalized encoding: good for RDBMS statistics & query planner in joins
- Requires multiple rows per node
- Lineage Column (a.k.a. Materialized Path, Path Enumeration)
- Column: lineage (e.g. /parent/child/grandchild/etc...)
- Cheap descendants via prefix query (e.g.
LEFT(lineage, #) = '/enumerated/path'
)
- Writes costs
O(log n)
(size of the subtree) for insert, updates, deletes
- Non-relational: relies on Array datatype or serialized string format
- Nested Intervals
- Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)
- Has real/float/decimal representation/precision issues
- Matrix encoding variant adds ancestor encoding (materialized path) for "free", but with the added trickiness of linear algebra.
- Flat Table
- A modified Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
- Cheap to iterate/paginate over
- Expensive move and delete
- Good Use: threaded discussion - forums / blog comments
- Multiple lineage columns
- Columns: one for each lineage level, refers to all the parents up to the root, levels down from the item's level are set to NULL
- Cheap ancestors, descendants, level
- Cheap insert, delete, move of the leaves
- Expensive insert, delete, move of the internal nodes
- Hard limit to how deep the hierarchy can be
Database Specific Notes
MySQL/MariaDB
Oracle
PostgreSQL
SQL Server
- General summary
- 2008 offers HierarchyId data type that appears to help with the Lineage Column approach and expand the depth that can be represented.
发布评论
评论(8)
我最喜欢的答案是该线程中的第一句话是什么。使用邻接列表来维护层次结构并使用嵌套集来查询层次结构。
到目前为止,问题一直是,从邻接列表到嵌套集的封面方法一直很慢,因为大多数人都使用被称为“ push stack”的极端RBAR方法来进行转换,并且被认为是昂贵的方式通过邻接列表和嵌套集的出色表现,达到维护简单性的必要性。结果,大多数人最终不得不安顿下一个或另一个人,尤其是如果有一个糟糕的节点左右。使用推动堆栈方法可能需要一整天的时间来进行有关MLM'S认为是小型节点层次结构的转换。
我以为我会通过提出一种将邻接列表转换为嵌套集的方法,以似乎是不可能的。这是我的i5笔记本电脑上推动堆栈方法的性能。
这是新方法的持续时间(使用括号中的推动堆栈方法)。
是的,这是正确的。 100万个节点在不到一分钟的时间内转换了100,000个节点,在4秒钟内。
您可以阅读有关新方法的信息,并在以下URL中获取代码的副本。
http://www.sqlservercentral.com/articles/articles/hierarchy/hierarchy/94040/
使用类似方法汇总的“层次结构。 MLM'ER和制作材料清单的人将对本文特别感兴趣。
http://www.sqlservercentral.com/articles/articles/t-sql/94570/ 停下来看看这两篇文章,跳入“加入讨论”链接,让我知道您的想法。
My favorite answer is as what the first sentence in this thread suggested. Use an Adjacency List to maintain the hierarchy and use Nested Sets to query the hierarchy.
The problem up until now has been that the coversion method from an Adjacecy List to Nested Sets has been frightfully slow because most people use the extreme RBAR method known as a "Push Stack" to do the conversion and has been considered to be way to expensive to reach the Nirvana of the simplicity of maintenance by the Adjacency List and the awesome performance of Nested Sets. As a result, most people end up having to settle for one or the other especially if there are more than, say, a lousy 100,000 nodes or so. Using the push stack method can take a whole day to do the conversion on what MLM'ers would consider to be a small million node hierarchy.
I thought I'd give Celko a bit of competition by coming up with a method to convert an Adjacency List to Nested sets at speeds that just seem impossible. Here's the performance of the push stack method on my i5 laptop.
And here's the duration for the new method (with the push stack method in parenthesis).
Yes, that's correct. 1 million nodes converted in less than a minute and 100,000 nodes in under 4 seconds.
You can read about the new method and get a copy of the code at the following URL.
http://www.sqlservercentral.com/articles/Hierarchy/94040/
I also developed a "pre-aggregated" hierarchy using similar methods. MLM'ers and people making bills of materials will be particularly interested in this article.
http://www.sqlservercentral.com/articles/T-SQL/94570/
If you do stop by to take a look at either article, jump into the "Join the discussion" link and let me know what you think.
我选择的邻接模型 +嵌套集型模型
是因为我可以轻松地将新项目插入树(您只需要一个分支的ID即可插入新项目),并且还可以很快地查询它。
parent
列即可。lft> lft
和rgt
of parent的项目的lft
。lft
的项目,低于节点的lft
和rgt
大于节点的rgt
,然后按parent
对。我需要使访问和查询树比插入更快,这就是为什么我选择此
唯一的问题是修复
左
和right
列插入新项目时。好吧,我为此创建了一个存储过程,并每次插入一个新项目时都会称其为罕见,但这确实很快。我从乔·塞尔科(Joe Celko)的书以及存储程序以及如何提出的想法中得到了这个想法。
https://dba.stackexchange.com/q/q/q/89051/41481
尽管此解决方案可以快速搜索来定位后代,由于其在这些操作中的性能缓慢而需要频繁插入或删除的大型数据集并不理想。因此,它最适合不经常造成的桌子。
Adjacency Model + Nested Sets Model
I went for it because I could insert new items to the tree easily (you just need a branch's id to insert a new item to it) and also query it quite fast.
parent
column.lft
betweenlft
andrgt
of parent.lft
lower than the node'slft
andrgt
bigger than the node'srgt
and sort the byparent
.I needed to make accessing and querying the tree faster than inserts, that's why I chose this
The only problem is to fix the
left
andright
columns when inserting new items. well I created a stored procedure for it and called it every time I inserted a new item which was rare in my case but it is really fast.I got the idea from the Joe Celko's book, and the stored procedure and how I came up with it is explained here in DBA SE
https://dba.stackexchange.com/q/89051/41481
Although this solution allows for rapid searches to locate descendants, it is not ideal for handling large datasets that require frequent inserts or deletes due to its slow performance in these operations. Therefore, it is best suited for tables that won't chnage frequently.
该设计尚未提及:
多个谱系列
尽管有限制,如果您可以忍受,它非常简单,非常非常简单,非常非常非常高效的。功能:
在这里遵循一个例子 - 鸟类的分类树,因此层次结构是类/订单/订单/家庭/族/属/物种 - 物种是最低水平 单元(在叶子节点的情况下对应于物种)
,1行= 1分类
。类别不会改变树上的水平。
This design was not mentioned yet:
Multiple lineage columns
Though it has limitations, if you can bear them, it's very simple and very efficient. Features:
Here follows an example - taxonomic tree of birds so the hierarchy is Class/Order/Family/Genus/Species - species is the lowest level, 1 row = 1 taxon (which corresponds to species in the case of the leaf nodes):
and the example of the data:
This is great because this way you accomplish all the needed operations in a very easy way, as long as the internal categories don't change their level in the tree.
这是您问题的部分答案,但我希望仍然有用。
Microsoft SQL Server 2008实现了两个对于管理层次数据极为有用的功能:
看看 kent tegels在MSDN上的开始。另请参阅我自己的问题: sql Server 2008中的递归same-table查询
This is a very partial answer to your question, but I hope still useful.
Microsoft SQL Server 2008 implements two features that are extremely useful for managing hierarchical data:
Have a look at "Model Your Data Hierarchies With SQL Server 2008" by Kent Tegels on MSDN for starts. See also my own question: Recursive same-table query in SQL Server 2008
如果您的数据库支持数组,则您还可以将谱系列或物有的路径作为父ID数组来实现。
特别是在Postgres中,您可以使用集合操作员查询层次结构,并通过杜松子酒指数获得出色的性能。这使得在一个查询中找到父母,孩子和深度相当微不足道。更新也非常易于管理。
我有完整的文字使用“ nofollow noreferrer”>用于实现路径的阵列如果您很好奇。
If your database supports arrays, you can also implement a lineage column or materialized path as an array of parent ids.
Specifically with Postgres you can then use the set operators to query the hierarchy, and get excellent performance with GIN indices. This makes finding parents, children, and depth pretty trivial in a single query. Updates are pretty manageable as well.
I have a full write up of using arrays for materialized paths if you're curious.
这确实是一个方形的圆孔问题。
如果关系数据库和SQL是您唯一或愿意使用的锤子,那么到目前为止发布的答案是足够的。但是,为什么不使用旨在处理层次数据的工具呢? Graph Database 是复杂的层次数据的理想选择。
与图形数据库解决方案解决相同的问题相比,关系模型的效率低下以及将图形/分层模型映射到关系模型的任何代码/查询解决方案的复杂性以及将其映射到关系模型的复杂性是不值得的。
将材料清单视为常见的分层数据结构。
两个子组件之间的最短路径:简单的图形遍历算法。可接受的路径可以根据标准获得资格。
相似性:两个组件之间的相似程度是多少?在计算两个子树的交点和结合的两个子树上进行遍历。相似的百分比是交叉路口除以联盟。
传递闭合:走下子树并总结了感兴趣的领域,例如“子组装中有多少铝?”
是的,您可以使用SQL和关系数据库解决问题。但是,如果您愿意使用合适的工具来工作,可以采用更好的方法。
This is really a square peg, round hole question.
If relational databases and SQL are the only hammer you have or are willing to use, then the answers that have been posted thus far are adequate. However, why not use a tool designed to handle hierarchical data? Graph database are ideal for complex hierarchical data.
The inefficiencies of the relational model along with the complexities of any code/query solution to map a graph/hierarchical model onto a relational model is just not worth the effort when compared to the ease with which a graph database solution can solve the same problem.
Consider a Bill of Materials as a common hierarchical data structure.
Shortest path between two sub-assemblies: Simple graph traversal algorithm. Acceptable paths can be qualified based on criteria.
Similarity: What is the degree of similarity between two assemblies? Perform a traversal on both sub-trees computing the intersection and union of the two sub-trees. The percent similar is the intersection divided by the union.
Transitive Closure: Walk the sub-tree and sum up the field(s) of interest, e.g. "How much aluminum is in a sub-assembly?"
Yes, you can solve the problem with SQL and a relational database. However, there are much better approaches if you are willing to use the right tool for the job.
我正在为层次结构使用PostgreSQL带有封闭表。
我为整个数据库都有一个通用存储的过程:
然后,对于每个具有层次结构的表,我创建了一个触发器,
用于从现有层次结构中填充封闭
表, 深度。有可能(甚至我建议)存储具有相同价值的祖先和后代的记录,而深度为零的值。这将简化检索层次结构的查询。它们确实非常简单:
I am using PostgreSQL with closure tables for my hierarchies.
I have one universal stored procedure for the whole database:
Then for each table where I have a hierarchy, I create a trigger
For populating a closure table from existing hierarchy I use this stored procedure:
Closure tables are defined with 3 columns - ANCESTOR_ID, DESCENDANT_ID, DEPTH. It is possible (and I even advice) to store records with same value for ANCESTOR and DESCENDANT, and a value of zero for DEPTH. This will simplify the queries for retrieval of the hierarchy. And they are very simple indeed:
MySQL现在支持JSON数据类型:
https> https://dev.mysql。 com/doc/refman/8.0/en/json.html
MySQL now supports the JSON data type:
https://dev.mysql.com/doc/refman/8.0/en/json.html