如何确定数据库规范化的程度?

发布于 2024-07-05 00:40:18 字数 105 浏览 9 评论 0原文

创建数据库结构时,需要遵循哪些好的准则或确定数据库应规范化的程度的好方法是什么? 您是否应该创建一个非标准化数据库并随着项目的进展将其拆分? 您是否应该创建完全标准化的表并根据性能需要组合表?

When creating a database structure, what are good guidelines to follow or good ways to determine how far a database should be normalized? Should you create an un-normalized database and split it apart as the project progresses? Should you create it fully normalized and combine tables as needed for performance?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

人间☆小暴躁 2024-07-12 00:40:18

通常,如果您在其他软件允许的范围内进行标准化,那么您就可以完成。

例如,当使用对象关系映射技术时,您将拥有一组丰富的用于各种多对一和多对多关系的语义。 在底层,它将提供具有有效 2 个主键的连接表。 虽然相对较少,但真正的规范化通常会为您提供与 3 个或更多主键的关系。 在这种情况下,我更愿意坚持使用 O/R 并编写自己的代码以避免各种数据库异常。

Often if you normalize as far as your other software will let you, you'll be done.

For example, when using Object-Relational mapping technology, you'll have a rich set of semantics for various many-to-one and many-to-many relationships. Under the hood that'll provide join tables with effectively 2 primary keys. While relatively rare, true normalization often gives you relations with 3 or more primary keys. In cases like this, I prefer to stick with the O/R and roll my own code to avoid the various DB anomalies.

无可置疑 2024-07-12 00:40:18

只需尝试使用常识即可。

还有一些人说 - 我必须同意他们的观点 - 如果您发现自己在大多数查询中将 6 个(神奇的数字)表连接在一起 - 不包括报告相关的表 - 那么您可能会考虑稍微反规范化。

Just try to use common sense.

Also some say - and I have to agree with them - that, if you're finding yourself joining 6 (the magic number) tables together in most of your queries - not including reporting related ones- , than you might consider denormalizing a bit.

初懵 2024-07-12 00:40:18

Jeff has a pretty good overview of his philosophy on his blog: Maybe normalization isn't normal. The main thing is: don't overdo normalization. But I think an even bigger point to take away is that it probably doesn't matter too much. Unless you're running the next Google, you probably won't notice much of a difference until your application grows.

一紙繁鸢 2024-07-12 00:40:18

我觉得数据库规范化是一种艺术形式。

您不想过度规范化数据库,因为您将拥有太多表,这将导致您对简单对象的查询花费比应有的时间更长的时间。

我遵循的一个好的经验法则是对一遍又一遍重复的相同信息进行标准化。

例如,如果您正在创建联系人管理应用程序,则将地址(街道、城市、州、邮政编码等)作为其自己的表是有意义的。

但是,如果您只有 2 种类型的联系人(企业或个人),并且您知道只有 2 种类型,您是否需要联系人类型表? 对我来说没有。

我首先要弄清楚您需要的数据类型。 使用 Visio 等建模程序来提供帮助。 您不想从非标准化数据库开始,因为您最终会标准化。 首先将对象放入逻辑分组中,当您看到重复的数据时,将该数据放入新表中。 我会跟上这个过程,直到你觉得你已经设计好了数据库。

让测试告诉您是否需要合并表。 编写良好的查询可以涵盖任何过度标准化的情况。

Database normizational I feel is an art form.

You don't want to over normalize your database because you will have too many tables and it will cause your queries of even simple objects take longer than they should.

A good rule of thumb I follow is to normalize the same information repeated over and over again.

For example if you are creating a contact management application it would make sense to have Address (Street, City, State, Zip, etc. . ) as its own table.

However if you have only 2 types of contacts, Business or personal, do you need a contact type table if you know you are only going to have 2? For me no.

I would start by first figuring out the datatypes you need. Use a modeling program to help like Visio. You don't want to start with a non-normalized database because you will eventually normalize. Start by putting objects in there logical groupings, as you see data repeated take that data into a new table. I would keep up with that process until you feel you have the database designed.

Let testing tell you if you need to combine tables. A well written query can cover any over normalization.

一江春梦 2024-07-12 00:40:18

我相信从非规范化数据库开始,随着进展逐渐转向规范化通常是最容易上手的。 对于正常化到什么程度的问题,我的哲学是正常化,直到开始受到伤害。 这听起来可能有点轻率,但它通常是衡量要走多远的好方法。

I believe starting with an un-normalized database and moving toward normalized as you progress is usually easiest to get started. To the question of how far to normalize, my philosophy is to normalize until is starts to hurt. That may sound a little flippant, but it generally is a good way to gauge how far to take it.

疯狂的代价 2024-07-12 00:40:18

拥有标准化的数据库将为您提供最大的灵活性和最简单的维护。 我总是从规范化的数据库开始,然后仅在存在需要解决的现实问题时才进行非规范化。

我认为这与代码性能类似,即编写可维护、灵活的代码,并在您知道存在性能问题时对性能做出妥协。

Having a normalized database will give you the most flexibility and the easiest maintenance. I always start with a normalized database and then un-normalize only when there is an real life problem that needs addressing.

I view this similarly to code performance i.e. write maintainable, flexible code and make compromises for performance when you know that there is a performance problem.

太阳公公是暖光 2024-07-12 00:40:18

原发帖者从未描述过数据库将在什么情况下使用。 如果它是任何类型的数据仓库项目,在某些时候您需要多维数据集(OLAP)处理某些前端的数据,那么从星型模式(事实表+维度)开始而不是研究会更明智正常化。 在这种情况下,金博尔的书会有很大帮助。

The original poster never described in what situation the database will be used. If it's going to be any type of data warehousing project where at some point you will need cubes (OLAP) processing data for some front-end, it would be wiser to start off with star schema (fact tables + dimension) rather than looking into normalization. The Kimball books will be of great help in this case.

情绪 2024-07-12 00:40:18

我同意通常最好从规范化数据库开始,然后反规范化来解决非常具体的问题,但我可能会从 Boyce-Codd 范式 而不是第三范式。

I agree that it is typically better to start out with a normalized DB and then denormalize to solve very specific problems, but I'd probably start at Boyce-Codd Normal Form instead of 3rd Normal Form.

慈悲佛祖 2024-07-12 00:40:18

事实是“这取决于情况”。 这取决于很多因素,包括:

  • 代码(手动编码或工具驱动(如 ETL 包))
  • 主要应用程序(事务处理、数据仓库、报告)
  • 数据库类型(MySQL、DB/2、Oracle、Netezza 等)
  • 数据库架构(表格、列式)
  • DBA 质量(主动、被动、不主动)
  • 预期数据质量(您想在应用程序级别还是数据库级别强制执行数据质量?)

The truth is that "it depends." It depends on a lot of factors including:

  • Code (Hand-coded or Tool driven (like ETL packages))
  • Primary Application (Transaction Processing, Data Warehousing, Reporting)
  • Type of Database (MySQL, DB/2, Oracle, Netezza, etc.)
  • Database Architecture (Tablular, Columnar)
  • DBA Quality (proactive, reactive, inactive)
  • Expected Data Quality (do you want to enforce data quality at the application level or the database level?)
情未る 2024-07-12 00:40:18

我同意您应该尽可能地标准化,并且仅在性能绝对必要时才进行非标准化。 对于物化视图或缓存方案,这通常是不必要的。

需要记住的是,通过标准化模型,您可以为数据库提供有关如何约束数据的更多信息,以便消除不完全标准化模型中可能发生的更新异常的风险。

如果您进行非规范化,那么您要么需要接受可能会出现更新异常的事实,要么需要自己在应用程序代码中实现约束验证。 这剥夺了使用 DBMS 的很多好处,DBMS 允许您以声明方式定义这些约束。

因此,假设代码质量相同,非规范化实际上可能不会给您带来更好的性能。

另一件值得一提的是,如今硬件很便宜,因此投入额外的处理能力来解决问题通常比接受清理损坏数据的潜在成本更具成本效益。

I agree that you should normalise as much as possible and only denormalise if absolutely necessary for performance. And with materialised views or caching schemes this is often not necessary.

The thing to bare in mind is that by normalising your model you are giving the database more information on how to constrain your data so that you can remove the risk of update anomalies that can occur in incompletely normalised models.

If you denormalise then you either need to live with the fact that you may get update anomolies or you need to implement the constraint validation yourself in your application code. This takes away a lot of the benefit of using a DBMS which lets you define these constraints declaratively.

So assuming the same quality of code, denormalising may not actually give you better performance.

Another thing to mention is that hardware is cheap these days so throwing extra processing power at the problem is often more cost effective than accepting the potential costs of cleaning up corrupted data.

风和你 2024-07-12 00:40:18

标准化意味着消除冗余数据。 换句话说,非规范化或非规范化数据库是相同信息将在多个不同位置重复的数据库。 这意味着您必须编写更复杂的更新语句以确保在各处更新相同的数据,否则您会得到不一致的数据,这又意味着查询的输出不真实。

这是一个相当大的问题,所以我会说非规范化会带来伤害,而不是相反。

在某些情况下,如果您认为好处超过了更新数据的额外工作和数据损坏的风险,您可能会故意决定对数据库的特定部分进行非规范化。 例如,对于数据仓库,出于性能原因而聚合数据,并且数据在初始输入后通常不更新,从而减少不一致的风险。

但总的来说,要避免为了性能而进行非规范化。 例如,非规范化联接的性能优势通常可以通过使用物化视图(也称为索引视图)来实现,这与查询非规范化表一样快,但是仍然保护了数据的一致性。

Normalization means eliminating redundant data. In other words, an un-normalized or de-normalized database is a database where the same information will be repeated in multiple different places. This means you have to write more complex update statement to ensure you update the same data everywhere, otherwise you get inconsistent data which in turn means the output of queries is unrealiable.

This is a pretty huge problem, so I would say denormalization hurts, not the other way around.

In some case you may deliberately decide to denormalize specific parts of a database, if you judge that the benefit outweighs the extra work in updating data and the risk of data corruption. For example with datawarehouses, where data is aggregated for performance reasons, and data if often not updated after the initial entry which reduce the risk of inconsistencies.

But in general be weary of denormalizing for performance. For example the performance benefit of a denormalized join can typically be achieved by using materialized view (also called indexed view), which will be as fast as querying a denormalized table, but still protects the consistency of the data.

一萌ing 2024-07-12 00:40:18

您想要开始设计一个符合第三范式的规范化数据库。 当您开发业务逻辑层时,您可能会决定必须稍微反规范化,但永远,永远不要低于第三种形式。 始终保持第一和第二表格合规。 您希望进行非规范化是为了简化代码,而不是为了性能。 为此使用索引和存储过程:)

不“随行标准化”的原因是,每次修改数据库设计时,您都必须修改已经编写的代码。

有几篇不错的文章:

http://www.agiledata.org/essays/dataNormalization。 html

You want to start designing a normalized database up to 3rd normal form. As you develop the business logic layer you may decide you have to denormalize a bit but never, never go below the 3rd form. Always, keep 1st and 2nd form compliant. You want to denormalize for simplicity of code, not for performance. Use indexes and stored procedures for that :)

The reason not "normalize as you go" is that you would have to modify the code you already have written most every time you modify the database design.

There are a couple of good articles:

http://www.agiledata.org/essays/dataNormalization.html

君勿笑 2024-07-12 00:40:18

@GrizzlyGuru 一位智者曾经告诉我“正常化直到它受伤,非正常化直到有效”。

它还没有让我失望:)

我不同意以非标准化形式开始,但是,根据我的经验,调整应用程序来处理标准化程度较低的数据库比标准化程度更高的数据库更容易。 它还可能导致其工作“足够好”的情况,因此您永远无法将其标准化(直到为时已晚!)

@GrizzlyGuru A wise man once told me "normalize till it hurts, denormalize till it works".

It hasn't failed me yet :)

I disagree about starting with it in un-normalized form however, in my experience its' been easier to adapt your application to deal with a less normalized database than a more-normalized one. It could also lead to situations where its' working "well enough" so you never get around to normalizing it (until its' too late!)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文