规范化数据库对资源有何影响?
当从相对非规范化的形式获取数据库并将其规范化时,资源利用率可能会发生什么变化(如果有的话)?
例如,规范化通常意味着从更少的表中创建更多的表,这意味着数据库现在拥有更多的表,但其中许多表都非常小,从而使常用的表可以更好地适应内存。
表数量较多还意味着(可能)需要更多联接来获取抽象出的数据,因此人们会期望系统需要执行的联接数量较多会产生某种影响。
那么,规范化未规范化的数据库对资源使用有什么影响(即会发生什么变化)?
编辑: 为了添加一些上下文,我有一个现有的(即遗留的)数据库,其中包含 300 多个可怕的表。大约 1/2 的数据是文本,另一半是字符字段或整数。没有任何形式的限制。我问这个问题的原因主要是为了获得更多信息,以说服其他人事情需要改变,并且性能或可维护性不会下降。不幸的是,我必须说服的那些人对非规范化数据库的性能优势有足够的了解,因此希望尽可能避免规范化。
When taking a database from a relatively un-normalized form and normalizing it, what, if any, changes in resource utilization might one expect?
For example, normalization often means more tables get created from fewer which means the database now has a higher number of tables, but many of them are quite small, allowing the often used ones to fit into memory better.
The higher number of tables also means that more joins are needed (potentially) to get at the data that was abstracted out, so one would expect some sort of impact from the higher number of joins the system needs to do.
So, what impact on resource usage (ie. what will change) does normalizing an un-normalized database have?
Edit:
To add a bit of context, I have an existing (ie. legacy) database with over 300 horrible tables. About 1/2 of the data is TEXT and the other half is either char fields or integers. There are no constraints of any kind. The reason I ask is primarily to get more information for convincing others that things need to change and that there won't be a decrease in performance or maintainability. Unfortunately, those I have to convince know just enough about the performance benefits of a de-normalized database to want to avoid normalization as much as possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
这实际上无法以一般方式回答,因为影响将根据相关数据库和使用该数据库的应用程序的具体情况而有很大差异。
因此,您基本上陈述了有关影响的一般预期:
因此唯一真正的答案是通常的:这取决于;)
注意:这假设我们正在谈论谨慎且有意的非规范化。如果您指的是缺乏经验的开发人员常见的“随着数据出现而将一些表放在一起”的方法,我冒着风险说规范化将减少所有级别的资源需求;)
编辑:关于 cdeszaq 添加的具体上下文,我会说“祝你好运,让你的观点通过”;)
显然,有超过 300 个表并且没有限制(!),你的问题的答案肯定是“标准化将减少各个层面的资源需求”(可能会大大减少),但是:
重构这样的混乱将是一项重大任务。如果只有一个应用程序使用这个数据库,那就已经很可怕了——如果有很多应用程序,它可能会成为一场噩梦!
因此,即使从长远来看,正常化会大大减少资源需求,但根据具体情况,可能不值得这么麻烦。这里的主要问题是关于长期范围的 - 这个数据库有多重要,它会使用多长时间,将来会有更多的应用程序使用它,当前的维护工作是恒定的还是增加的,等等
......不要忽视它是一个正在运行的系统 - 即使它丑陋且可怕,根据您的描述它还没有损坏;-)
This can not really be answered in a general manner, as the impact will vary heavily depending on the specifics of the database in question and the apps using it.
So you basically stated the general expectations concerning the impact:
So the only real answer is the usual: it depends ;)
Note: This assumes that we are talking about cautious and intentional denormalization. If you are referring to the 'just throw some tables together as data comes along' approach way to common with inexperienced developers, I'd risk the statement that normalization will reduce resource needs on all levels ;)
Edit: Concerning the specific context added by cdeszaq, I'd say 'Good luck getting your point through' ;)
Oviously, with over 300 Tables and no constraints (!), the answer to your question is definitely 'normalizing will reduce resource needs on all levels' (and probably very substantially), but:
Refactoring such a mess will be a major undertaking. If there is only one app using this database, it is already dreadful - if there are many, it might become a nightmare!
So even if normalizing would substantially reduce resource needs in the long run, it might not be worth the trouble, depending on circumstances. The main questions here are about long term scope - how important is this database, how long will it be used, will there be more apps using it in the future, is the current maintenance effort constant or increasing, etc. ...
Don't ignore that it is a running system - even if it's ugly and horrible, according to your description it is not (yet) broken ;-)
“规范化”仅且排他应用于数据库的逻辑设计。
数据库的逻辑设计和数据库的物理设计是两个完全不同的事物。数据库理论一直希望事情是这样的。事实上,忽视/无视这种区别的开发人员(出于无知或出于粗心或出于懒惰或出于任何其他所谓但无效的“原因”)是绝大多数,但这并不意味着他们是正确的。
逻辑设计可以说是规范化的,也可以说不是规范化的,但是逻辑设计本身并不具有任何“性能特征”。就像'c:=c+1;'本身不具有任何性能特征。
物理设计确实决定了“性能特征”,但物理设计根本不具备“标准化与否”的质量。
这种对“规范化损害性能”的错误认识实际上无非是具体的证据,证明当今存在的所有 DBMS 引擎都严重缺乏物理设计选项。
"Normalization" applies only and exclusively to the logical design of a database.
The logical design of a database and the physical design of a database are two completely distinct things. Database theory has always intended for things to be this way. The fact that the developers who overlook/disregard this distinction (out of ignorance or out of carelessness or out of laziness or out of whatever other so-called-but-invalid "reason") are the vast majority, does not make them right.
A logical design can be said to be normalized or not, but a logical design does not inherently carry any "performance characteristic" whatsoever. Just like 'c:=c+1;' does not inherently carry any performance characteristic.
A physical design does determine "performance characteristics", but then again a physical design simply does not have the quality of being "normalized or not".
This flawed perception of "normalization hurting performance" is really nothing else than concrete proof that all the DBMS engines that exist today are just seriously lacking in physical design options.
您的问题有一个非常简单的答案:这取决于情况。
首先,我将你的问题重新表述为“非规范化的好处是什么”,因为规范化是默认应该完成的事情(作为纯逻辑模型的结果),然后非规范化可以应用于非常性能至关重要的特定表。非规范化的主要问题是它会使数据完整性管理变得复杂,但在某些情况下好处大于风险。
我对非规范化的建议是:仅在真正造成伤害时才进行,并确保在插入、更新或删除后维护数据完整性时涵盖所有场景。
There's a very simple answer to your question: it depends.
Firstly, I'd re-phrase your question as 'what is the benefit of denormalization', because normalization is the something that should be done as a default (as the result of a pure logical model) and then denormalization can be applied for very specific tables where performance is critical. The main problem of denormalization is that it can complicate data integrity management, but the benefits in some cases outweigh the risks.
My advice for denormalization: do it only when it really hurts and make sure you got all scenarios covered when it comes to maintaining data integrity after any inserts, updates or deleted.
强调之前发帖者提出的一些观点:您当前的模式真的是非规范化的吗?设计数据库的正确方法(恕我直言)是:
(可能还有其他原因非规范化,但我能立即想到的唯一原因是政治原因 - 必须匹配现有代码,开发人员/经理不这样做 我的观点是,如果您从未完全规范化,那么您就没有非规范化的数据库,而是
拥有一个非规范化数据库。我认为您可以为这些数据库想出更具描述性但不太礼貌的术语。
To underscore some points made by prior posters: Is you current schema really denormalized? The proper way (imho) to design a database is to:
(There may be other reasons to denormalize, but the only ones I can think of off-hand are political ones--have to match the existing code, the developers/managers don't like it, etc.)
My point is, if you never fully normalized, you don't have a denormalized database, you've got an unnormalized one. And I think you can think of more descriptive if less polite terms for those databases.
我发现,在某些情况下,标准化会提高性能。
小桌子读起来更快。严重非规范化的数据库通常会比规范化设计具有 (a) 更长的行和 (b) 更多的行。
读取更少的较短行意味着更少的物理 I/O。
I've found that normalization, in some cases, will improve performance.
Small tables read more quickly. A badly denormalized database will often have (a) longer rows and (b) more rows than a normalized design.
Reading fewer shorter rows means less physical I/O.
一方面,您最终将不得不进行结果集计算。例如,如果您有一个
Blog
,其中包含许多Post
,您可以这样做:这比
并且可能会导致http://ayende.com/Blog/archive/2008/12/01/solving-the-select-n1-problem.aspx" rel="nofollow noreferrer">
选择 N+1
如果你不小心,就会出现问题。当然,使用第二个选项时,您必须保持数据完整性,但如果第一个选项足够痛苦,那么您就可以让它发挥作用。
请小心,不要犯过早优化。以标准化的方式进行,然后根据需求衡量性能,只有当性能达不到要求时,您才应该考虑非标准化。
For one thing, you'll end up having to do resultset calculations. For example, if you have a
Blog
, with a number ofPost
s, you could either do:which is more expensive than
and can lead to the
SELECT N+1
problem, if you're not careful.Of course with the second option you have to deal with keeping the data integrity, but if the first option is painful enough, then you make it work.
Be careful you don't fall foul of premature optimisation. Do it in the normalised fashion, then measure performance against requirements, and only if it falls short should you look to denormalise.
规范化模式往往在插入/更新/删除方面表现更好,因为不存在“更新异常”,并且需要进行的实际更改更加本地化。
SELECT 是混合的。非规范化本质上是实现连接。毫无疑问,物化连接有时会有所帮助,但是,物化通常非常悲观(可能更常见),因此不要认为非规范化会对您有所帮助。此外,规范化模式通常较小,因此可能需要较少的 I/O。连接不一定很昂贵,所以不要自动假设它会很昂贵。
Normalized schemas tend to perform better for INSERT/UPDATE/DELETE because there are no "update anomalies" and the actual changes that need to be made are more localized.
SELECTs are mixed. Denormalization is esentially materializing a join. There's no doubt that materializing a join sometimes helps, however, materialization is often very pessimistic (probably more often than not), so don't assume that denormalization will help you. Also, normalized schemas are generally smaller and therefore might require less I/O. A join is not necessarily expensive, so don't automatically assume that it will be.
我想详细说明Henrik Opel的#3子弹点。开发成本可能会上升,但并非必须如此。事实上,数据库的规范化应该简化或启用 ORM、代码生成器、报告编写器等工具的使用。这些工具可以显着减少在应用程序的数据访问层上花费的时间,并将开发转移到添加业务价值。
您可以在此处找到很好的 StackOverflow 讨论关于规范化数据库的开发方面。有很多很好的答案、评论和值得思考的事情。
I wanted to elaborate on Henrik Opel's #3 bullet point. Development costs might go up, but they don't have to. In fact, normalization of a database should simplify or enable the use of tools like ORMs, Code Generators, Report Writers, etc. These tools can significantly reduce the time spent on the data access layer of your applications and move development on through to adding business value.
You can find a good StackOverflow discussion here about the development aspect of normalized databases. There were many good answers, comments and things to think about.