MySQL - 处理这种分层数据的最佳方法?
这是以下内容的后续:
MySQL - 是否可以获取所有层次结构中的子项?
我有一个任意深度的邻接列表模型表(我现在可以将其转换为 >嵌套集合模型。
我阅读了有关如何使用嵌套集合模型的MySQL数据,尽管执行插入、更新和删除等基本功能似乎变得越来越复杂,并且非常复杂,
另一个博客展示了如何 使用嵌套集合模型。使用带有邻接列表模型的触发器系统来保留将每个对象与其祖先相关联的祖先表。
现在我需要能够返回给定节点的所有子节点的列表,以更改或删除它们。结构一旦创建就不会一直改变,但是会有大量的层次结构,
我看到的三种方法是:
创建一个存储过程,它将执行递归。返回所有子项的查询。
转换为嵌套集模型,这需要了解复杂性,并可能创建一个存储过程来在其中添加、编辑和删除。
创建上述插入/删除触发器上的祖先表以处理所有数据。
如果还有我没有探索的其他方法,请告诉我,我将更新此列表。
This is a followup to:
MySQL - Is it possible to get all sub-items in a hierarchy?
I have an arbitrary-depth adjacency list model table (I am at the point that I can convert it into a nested set model.
I read the MySQL data on how to use a nested set model, though it seemed to get increasingly complex and very complex to do basic functions such as inserting, updating and deleting.
Another blog showing how to use a trigger system with the adjacency list model to keep a table of ancestors that relates each object to its ancestors.
Right now I need to be able to return a list of all children of a given node, to change or delete them. This hierarchical structure won't be changing all the time once created, but there will be a mass amount of the hierarchical structures.
The three methods I see are:
Created a Stored Procedure which would do a recursive query that returns all children.
Convert to Nested Set Model which would require to get into the complexities and possibly create a stored procedure to add, edit and delete in that.
Create the Ancestor Table described above on insert/delete triggers to handle all of the data.
If there are other methods I'm not exploring, please let me know and I'll update this list.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
Quassnoi 对嵌套集模型和邻接列表模型进行了一些性能测试,并在他的博客文章中记录了结果和建议邻接列表与嵌套集:MySQL。执行摘要是:
这是他文章的结论:
本文的其余部分将展示如何定义表、实现查询并提供性能测量。使用空间索引是一个聪明的想法,可以提高您可能不熟悉的嵌套集模型的性能。
如果您也在考虑不使用 MySQL 的方法,那么您可能需要看看 PostgreSQL,这是另一个免费且开放的-源数据库。 PostgreSQL 支持 递归公用表表达式 形式的递归查询这使得查询层次结构数据比在 MySQL 中更容易,并且性能也更好。 Quassnoi 还写了一篇文章 邻接列表与嵌套设置:PostgreSQL 显示详细信息。
当我们谈论其他方法时,Oracle 的数据库也值得一提。 Oracle 还有一个自定义扩展
CONNECT BY
,它使查询层次结构数据变得非常容易和快速。 Quassnoi 的文章邻接列表与嵌套集:Oracle< /a> 再次涵盖性能细节。在这种情况下,获取所有子项所需的查询非常简单:Quassnoi has run some performance tests on the nested sets model and the adjacency list model and documented the results and recommendations in his blog post Adjacency list vs. nested sets: MySQL. The executive summary is:
Here is the conclusion from his article:
The rest of the article shows how to define the table, implement the queries and gives performance measurements. The use of the spatial index is a clever idea to improve the performance of the nested set model that might be new to you.
If you're also considering approaches without MySQL then you might want to look at PostgreSQL which is another free and open-source database. PostgreSQL supports recursive queries in the form of recursive common table expressions which make querying heirarchical data easier than in MySQL and also give better performance. Quassnoi has also written an article Adjacency list vs. nested sets: PostgreSQL that shows the details.
While we are talking about looking at other approaches, Oracle's database is also worth a mention. Oracle also have a custom extension
CONNECT BY
which make querying heirarchical data very easy and fast. Quassnoi's article Adjacency list vs. nested sets: Oracle again covers the performance details. The query you need to get all children is extremely simple in this case:为了简单和方便,我总是选择嵌套集。我总是建议这篇文章。它很好地显示了处理此类分层数据所需的查询。我在这里看到的唯一缺点是,当层次结构达到一定程度的复杂性时,插入/更新新记录可能会变慢,但读取速度比我见过的许多其他解决方案要快。
只是给你一个上面文章中的例子:
从 SQL 角度来说,我认为它不会变得更漂亮和更简单;)
我不知道存储过程方式。但由于它涉及递归(在您的情况下),我不知道层次结构中的许多级别是否会很快。我想你可以尝试一下。
I would always go with the Nested Set for shear simplicity and convienience. I always suggest this article. It shows excelent the queries that are needed for the work with such hierachrchical data. The only disadvantage I see here is that it can get slower with inserting/updateing new records when the hierachry reached a certain level of complexity, but the reading is faster than many other solutions I hae seen.
Just to give you an example from the article above:
SQL wise, I don't think it can get any prettier and simpler ;)
I have no idea to the Stored Procedure way. But since it involces recursion (in your case), I don't know if it will be fast with many levels in the hierarchy. I assume you can give it a try.
也许您应该考虑使用面向文档的数据库,例如 MongoDB。它可以让你的生活变得更轻松。
Maybe you should consider using document-oriented database like MongoDB. It could make your life a lot easier.
在处理分层数据集时,我发现最好在考虑缓存的情况下进行处理。以这种方式处理此问题的主要好处之一是它不需要将数据库非规范化为可能更难以变异的内容。
由于对于简单的
id -> 来说,内存堆(memcache、redis 等)查找比 SQL 快得多。 data
分辨率,我将使用它们来缓存每个节点的直接子节点的 id 列表。这样,您可以通过递归算法为任何节点构建完整列表,从而获得不错的性能。要添加/删除新节点,您只需使其直接父缓存失效
O(1)
。如果这还不够快,您可以将另一层缓存添加到每个节点的所有子节点的列表中。为了使其能够处理适当可变的数据集,您应该记录每个节点的缓存性能(新鲜/缓存命中的比率),并设置存储缓存的容忍级别。由于它是非重要数据,因此也可以存储在内存堆中。
如果您使用这种更高级的缓存模型,您需要注意,当任何子节点发生更改
O(log n)
时,这些完整的子节点列表将需要失效。获得子 ID 列表后,您可以使用 SQL 的
WHERE id IN( id1, id2, .... )
语法来查询您想要的内容。When dealing with hierarchical data sets I find it best to approach it with caching in mind. One of the main benefits to this way of dealing with this issue this way is it doesn't require de-normalizing you database into something that might be harder to mutate.
Since memory heaps' (memcache,redis,etc) lookups are much faster than SQL for simple
id -> data
resolutions, I would use them to cache a list of the ids of direct children for each node. This way you can get decent performance via a recursive algorithm to build a complete list for any node.To add/delete a new node, you will only need to invalidate its' direct parent cache
O(1)
.If that's not fast enough, you can add another layer of cache to a list of all child of a node at each node. In order for this to work with a decently mutable dataset, you should record the cache performance (ratio of fresh/cached hits) of each node and set a tolerance level for when to store the cache. This also can be stored in a memory heap since it's non-vital data.
If you use this more advanced caching model, you will need to note these complete children node lists will need to be invalidated when any of it's children are changed
O(log n)
.Once you have your list of children id's you can use SQL's
WHERE id IN( id1, id2, .... )
syntax to query for what you want.我曾经不得不在类似 SQL 的数据库管理器中存储一个复杂的分层任意深度的物料清单系统,但这并不能真正胜任这项任务,最终导致了混乱和棘手的索引、数据定义、查询等从头开始重新启动后,使用数据库管理器仅提供一个用于在简单索引键上进行记录读取和写入的API,并在外部代码中完成所有实际的输入/操作/报告,最终结果实现起来更快,更容易。理解,并且更容易维护和增强。所需的最复杂的查询本质上是 SELECT A FROM B。
因此,不要在 MySQL 的限制内嵌入逻辑和操作,而是考虑敲出代码来执行您想要的操作,并仅依赖 MySQL 进行最低级别的获取/放置。
I once had to store a complex hierarchical arbitrary-depth bill-of-material system in a SQL-like database manager that wasn't really up to the task, and it ended up forcing messy and tricky indicies, data definitions, queries, etc. After restarting from scratch, using the db manager to provide only an API for record reads and writes on simple indexed keys, and doing all of the actual input/manipulation/reporting in external code, the final result was quicker to implement, easier to understand, and simpler to maintain and enhance. The most complex query needed was essentially SELECT A FROM B.
So, instead of embedding logic and operations inside the restrictions of MySQL, consider banging out code to do what you want, and relying on MySQL only for the lowest-level gets/puts.