使用标准化表真的更好吗?
我听到我的团队领导说,在过去的一些项目中,他们必须取消标准化以使查询更快。
我认为这可能与表联合有关。
拥有更多的瘦表真的比拥有很少的胖表效率低吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
我听到我的团队领导说,在过去的一些项目中,他们必须取消标准化以使查询更快。
我认为这可能与表联合有关。
拥有更多的瘦表真的比拥有很少的胖表效率低吗?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(7)
这取决于......连接表本质上比拥有一个“预连接”(即非规范化)的大表慢。 然而,通过非规范化,您将创建数据重复,并且您的表将变得更大。 规范化被视为一件好事,因为它创建的数据库可以回答“任何”问题,如果正确完成,您可以构建一个选择来获取数据。 在其他一些形式的数据库中情况并非如此,这些现在(大部分)都是历史性的无关紧要,规范化/关系数据库赢得了这场战斗。
回到你的问题,使用反规范化来让事情进展得更快是一种被广泛接受的技术。 通常最好运行数据库一段时间,这样您就知道什么要反规范化,什么不要管,而且通常将数据保留其“正确”的规范化形式并将数据拉入一组非规范化报告中定期表。 如果该过程作为报表运行本身的一部分完成,那么数据也始终是最新的。
作为过度标准化的一个例子,我过去见过数据库,其中一周中的几天和一年中的几个月被拉到单独的表中 - 日期本身被标准化 - 你可能走得太远了。
It depends ... joining tables is inherently slower than having one big table that is 'pre-joined' ie de-normalised. However, by denormalising you're going to create data duplication and your tables are going to be larger. Normalisation is seen as a good thing, because it creates databases that can answer 'any' question, if it is properly done you can build a select to get to your data. This is not the case in some other forms of DB, and those are now (mostly) historic irrelevancies, the normalised/relation DB won that battle.
Back to your question, using de-normalisation to make things go faster is a well accepted technique. It's normally best to run your DB for a while so you know what to de-normalise and what to leave alone, and it's also common to leave the data in its 'correct' normalised form and pull data into a set of de-normalised reporting tables on a regular basis. If that process is done as part of the report run itself then the data is always up to date too.
As an example of over-normalisation I've seen DBs in the past where the days of the week, and months of the year were pulled out into separate tables - dates themselves were normalised - you can go too far.
您应该研究一下OLTP(在线事务处理)和OLAP(在线分析处理)数据库。
简而言之,主要关注记录事务(OLTP)的数据库系统通常以更规范化的方式构建,减少数据重复并简化记录的创建和更新,但以优化数据检索为代价。
更关注数据检索和分析(OLAP)的数据库系统通常以不太标准化的方式构建,牺牲数据存储优化以最大化查询和分析速度。
数据库规范化和非规范化是这种权衡的核心。
You should do some research on the differences between OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) databases.
In short, database systems which are concerned primarily with recording transactions (OLTP) are usually structured in a more normalized fashion, reducing data duplication and easing the creation and updating of records at the expense of optimized data retrieval.
Database systems which are more concerned with data retrieval and analysis (OLAP) are usually structured in a less normalized fashion, sacrificing data storage optimization so to maximize querying and analysis speed.
Database normalization and Denormalization are at the heart of this trade off.
根据我使用 Oracle OLTP 数据库的长期经验,其中一些数据库非常庞大且繁忙,我可以诚实地说,我不记得曾经遇到过真正需要“非规范化性能”的案例。 然而,我见过很多情况,有人因为对潜在性能问题的恐惧、不确定性和怀疑而提前决定应用非规范化。 这通常是在没有任何基准测试的情况下完成的,而且我总是发现实际上没有实现任何性能改进 - 但数据维护代码已经变得比原来复杂得多。
OLAP 是一种非常不同的动物,我无法对此发表评论。
In my long experience with Oracle OLTP databases, some of them very large and busy, I can honestly say I can't remember ever having come across a case where "denormalisation for performance" was truly required. I have, however, seen many cases where someone has decided in advance that denormalisation should be applied because of their fear, uncertainty and doubt about potential performance issues. This has usually been done without any benchmarking, and invariably I find that no performance improvement has been achieved in fact - but the data maintenance code has become far more complex than it would have been.
OLAP is a very different animal, and I'm not in a position to comment about that.
Jeff 对此进行了描述,随后引发了热烈的讨论。
这也是关于 SO 的大量讨论的主题,例如 whats更好的数据库设计更多的表或更多的列。 正如其他人指出的那样,请运用常识,不要过度正常化。
Jeff wrote about this, followed by a heated discussion.
It is also subject of much discussion on SO, e.g. whats the better database design more tables or more columns. As others have pointed, use common sense and do not over-normalize.
这个问题经常出现。 主要原因是 SQL(遥遥领先的最流行的数据库语言)及其所有最流行的实现将逻辑表设计与物理表设计混为一谈。
永恒的答案是你应该始终规范化你的逻辑表,但务实的答案很复杂,因为在现有 SQL 实现下实现某些优化的唯一方法是非规范化你的物理表设计(本身并不是一件坏事),在这些实现中,需要对逻辑表设计进行非规范化。
简而言之,这取决于。 有时,非规范化对于性能很重要,但就像其他与性能相关的事情一样,在考虑走这条路之前,您应该测量、测量、测量。
This question recurs altogether too often. The primary reason is that SQL, the most popular database language by a huge margin, and all of its most popular implementations, conflate logical table design with physical table design.
The eternal answer is that you should always normalize your logical tables, but the pragmatic answer is complicated by the fact that the only way to implement certain optimizations under existing SQL implementations is to denormalize your physical table design (itself not a bad thing) which, in those implementations, requires denormalizing your logical table design.
In short, it depends. Sometimes denormalization is important for performance, but like everything else performance-related you should measure, measure, measure before you even consider going down this route.
已知规范化会损害性能的原因是因为连接相当昂贵。 如果表 X 中有 N 条记录,表 Y 中有 M 条记录,则 X 和 Y 的联接会创建一个包含 N*M 条记录的临时表。 尽管数据库使用一些优化技巧在不需要时不生成整个表,但它仍然必须处理所有记录。
非规范化是一个过程,您可以将经常使用的数据放在一个表中以提高性能,同时保持一定的数据库纯度。 大多数人认为这是一个可以接受的交易,甚至设计有意非规范化的模式以跳过中间步骤。
The reason why normalization has been known to hurt performance is because joins are fairly expensive. If there are N records in table X and M records in table Y, then a join of X and Y creates a temporary table with as many as N*M records. Though there are optimization tricks that the database uses to not generate the entire table if it's not needed, it nevertheless has to process all the records.
Denormalization is the process whereby you put data often used together in a single table to increase performance, at the sake of some database purity. Most find it to be an acceptable trade, even going so far as to design the schema intentionally denormalized to skip the intermediary step.
性能与 RDBMS 上完成的标准化量成反比。 话虽这么说,表格越正常,出现错误的可能性就越小。 在某个时刻,非规范化可能会损害 RDBMS 的性能,即所有数据都保存在一个表中。
Performance is inverse to the amount of normalization done on RDBMS. That being said, the more normal the tables are, the less likelihood there is for errors. There is a point to where a RDBMS performance can be hurt by denormalization, at the point to where all of the data is held in one table.