Oracle 中什么更快?树形结构的小桌子 vs. 巨大的平板桌子
我正在设计一个将使用 Oracle 的应用程序,并且我们需要在数据库中映射这个部门层次结构。有些东西看起来像这样(我很确定你们都知道我在说什么,但为了以防万一,我将包含 ERD 的一部分):
因此,它将存储如下所示的数据:
[1 | 0]
[2 | 1]
[3 | 2]
[4 | 2]
换句话说:
Department 1
|__Department 2
|___Department 3
|___Department 4
依此类推...
这将提高表上所需的记录数量并且可以访问数据使用 CONNECT BY 命令,每个部门只有 1 个记录。我们通常采用这种树结构作为解决方案,但在这个新应用程序中,性能至关重要,所以我想知道如果我有一个像这样的扁平表格会怎样。
[1 | 0]
[2 | 1]
[3 | 1]
[3 | 2]
[4 | 1]
[4 | 2]
这使您可以拥有非常明显的关系,而无需知道给定子级的父部门即可知道其上层部门是谁。但这会增加所需的数据量,因为您需要为部门所在的每个级别提供一条记录,这意味着如果某个部门的级别比最高级别低 15 个级别,我们将需要 15 条记录。 该部门相当大,因此这可能最终会成为一个巨大的表(大约 200 万条记录)。
好了,简单介绍完之后,就是这个问题了;有没有人真正尝试过这个,可以告诉我这两个选项(巨大的平面表或小树表)之间的数据库更快/更便宜?
I'm designing an application that will use Oracle, and we have this department hierarchy that we need to map in our database. Some thing that looks like this (I'm pretty sure you all know what I'm talking about but I'll include a piece of the ERD just in case):
So it will have data stored that looks like this:
[1 | 0]
[2 | 1]
[3 | 2]
[4 | 2]
In other words:
Department 1
|__Department 2
|___Department 3
|___Department 4
And so on...
This will improve the number of records required on the table and the Data can be accessed using a CONNECT BY command, just having 1 recor per department. We usually go for this tree structure as solution, but in this new application performance is a critical, so I was wondering what if I have a flattened-out table that would look like this.
[1 | 0]
[2 | 1]
[3 | 1]
[3 | 2]
[4 | 1]
[4 | 2]
This allows you to have very obvious relationships without having to know the parent Department for a given child to know who their upper hierarchy Departments are. But this increases the amount of data required since you need a record for each level a Department is in, meaning that if a have a Department 15 levels below the top one we would require 15 records for it.
The Department is pretty big so this may end up being a huge table (about 2 million records).
Ok, so after the brief introduction, this is the question; Has someone actually tried this that could tell me what is faster/less expensive for the DB between this two options, the huge flat table or the small tree one?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您需要读取性能,请尝试路径枚举。
变成
所以你可以通过做选择 2 的所有孩子
当然这是标准化和性能之间的折衷。
If you need read performance, try path enumeration.
becomes
So you can select ALL children of 2 by doing
Of course this is a compromise between normalization and performance.
对于像部门这样的东西,表中不可能有足够的记录来导致性能问题。甚至不用担心这个。
即使对于某些其他类型的层次结构数据,可能存在太多记录而导致性能受到影响,也总是有其他技术/方法来解决这些性能问题(当它们出现时),并且实现这些其他解决方案的成本是几乎总是小于尝试根据平面模式对系统进行编码而增加的开发和维护工作量。
With something like Departments, it is not possible to have enough records in the table to where performance would be an issue. Don't even worry about that.
Even with some other type of heirarchical data, where there might be so many records that performance could be affected, there are always other technologies/approaches to address those performance issues (when they pop up), and the cost of implementing these other solutions is almost always less than the increase in Development and maintenance effort that you would incur from attempting to code your system against a flat schema.
我肯定会选择第一个选项(分层方法)。我认为正确地对数据进行建模比仅仅使用错误的数据模型来获得性能更好。由于您在这里对层次结构进行建模,因此以这种方式将其存储在数据库中是有意义的。
如果您想要两全其美,我的建议是考虑使用 物化视图来“展平”分层数据,那么您仍然可以正确存储数据,但可以通过使用物化视图获得性能提升(如果有)。
几乎总有一种方法可以遵循良好的数据模型,同时仍然找到获得良好性能的方法。但是一个糟糕的数据模型会让你在未来的几年里付出代价,而且以后纠正它需要付出巨大的痛苦。
然而,即使使用扁平化方法,您也必须考虑到记录数量会急剧增加,尤其是当您到达树中的叶节点时,因此如果有一个扁平的层次结构表(您的第二种方法),我会感到惊讶)会提高性能,因为有更多记录需要处理。
I would definitely go for the first option (hierarchical approach). I think it's better to model the data correctly than to just use a bad data model to gain performance. Since you are modeling a hierarchy here, it makes sense to store it that way in the DB.
If you want the best of both worlds, my recommendation would be to look at using a materialized view to "flatten" the hierarchical data, then you are still storing the data properly, but you get the performance gains (if any) by using the materialized view.
There's almost always a way to follow a good data model and still find ways to get good performance. But a bad data model will cost you for years to come, and it takes great pain to correct it later.
However, even with the flattened approach, you have to consider that you are increasing the number of records dramatically, especially as you get to the leaf nodes in the tree, so I'd be surprised if having a flat hierarchy table (your second approach) would improve performance since there are many more records to process.