想通过一个例子来理解 6NF

发布于 2024-10-14 20:27:02 字数 326 浏览 3 评论 0原文

我刚刚阅读了 @PerformanceDBA 的论点:6NF 和 EAV。我很好奇。我之前对 6NF 持怀疑态度,因为它“仅仅”在表上粘贴了一些时间戳列。

我一直使用数据字典,不需要被说服去使用它,或者生成 SQL 代码。因此,我期望答案需要用于生成代码的字典(或目录)。

所以我想知道 6NF 如何处理一个极其简单的例子。物品、描述和价格的表格。价格随着时间的推移而变化。

那么无论如何,Items 表转换为 6NF 后会是什么样子呢?什么是“表爆炸”?这发生在这里吗?

如果该示例无法使用如此简单的表格,请随意添加必要的内容以阐明要点。

I have just read @PerformanceDBA's arguments re: 6NF and E-A-V. I am intrigued. I had previously been skeptical of 6NF as it was presented as "merely" sticking some timestamp columns on tables.

I have always worked with a data dictionary and do not need to be convinced to use one, or to generate SQL code. So I expect an answer that would require a dictionary (or catalog) that is used to generate code.

So I would like to know how 6NF would deal with an extremely simple example. A table of items, descriptions and prices. The prices change over time.

So anyway, what does the Items table look like when converted to 6NF? What is the "explosion of tables?" that happens here?

If the example does not work with a table this simple, feel free to add what is necessary to get the point across.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

爱已欠费 2024-10-21 20:27:03

这些人记下了:锚定建模。关于该主题的精彩学术论文与实际例子相结合。他们的著作最终促使我考虑在即将到来的项目中使用 6nf 构建 DW。我所做的 POC 工作已经验证了(至少对我来说)6nf 的巨大好处不要超过成本。

These guys have it down: Anchor Modeling. Great academic papers on the subject, combined with practical examples. Their writings have finally pushed me over the edge to consider building a DW in 6nf on an upcoming project. The POC work I have done has validated (for me, at least) that the enormous benefits of 6nf don't outweigh the costs.

醉城メ夜风 2024-10-21 20:27:02

我实际上开始将答案放在一起,但我遇到了复杂化,因为你(很容易理解)想要一个简单的例子。问题是多方面的。

首先,我不太了解您在关系数据库和 5NF 方面的实际专业知识水平;我没有一个起点来讨论 6NF 的细节,

其次,就像任何其他 NF 一样,它是多种多样的。你只能勉强踏入其中;您可以对某些表实施 6NF;你可以在每张桌子上全力以赴,等等。当然,桌子数量会激增,但你可以将其标准化,并消除爆炸;这是 6NF 的高级或成熟实现。当您要求最简单、最直接的示例时,提供 6NF 的全部或部分级别是没有用的。

我相信您明白有些表可以是“5NF”,而另一些表可以是“6NF”。

所以我给你整理了一份。但即便如此也需要解释。

现在SQL几乎不支持5NF,它根本不支持6NF(我认为dportas用不同的词说的是同样的事情)。现在,出于性能原因,我在深层实现了 6NF,简化了旋转(整个表;任何和所有列,而不是 MS 中愚蠢的 PIVOT 函数)、列访问等。为此,您需要一个完整的目录,这是一个对SQL目录进行扩展,支持SQL不支持的6NF,维护数据完整性和业务规则。所以,你真的不想为了好玩而实现 6NF,只有在有需要时才这样做,因为你必须实现一个目录。 (这是 EAV 群体不做的事情,也是大多数 EAV 系统存在数据完整性问题的原因。它们中的大多数不使用 SQL 所具有的声明性引用和数据完整性。)

但是大多数实现 6NF 的人并没有实现更深层次的完整目录。他们有更简单的需求,因此实现了更浅层次的 6NF。那么,让我们为您提供一个简单的例子。让我们从一个声明为 5NF 的普通 Product 表开始(我们不要争论 5NF 是什么)。该公司销售各种不同类型的产品,一半的列是必填的,另一半是可选的,这意味着,根据产品类型,某些列可能为空。虽然他们可能在数据库方面做得很好,但 Null 现在是一个大问题:对于某些 ProductTypes 应该不为 Null 的列却是 Null,因为声明声明为 NULL,并且他们的应用程序代码只能与下一个人的一样好。

因此他们决定使用 6NF 来解决这个问题,因为 6NF 的副标题表明它消除了空问题。第六范式是不可约范式,此后不会再有 NF,因为数据无法进一步归一化。这些行已被最大程度地归一化。 6NF的定义是:

行包含主键且最多包含一个属性时,表处于 6NF 状态。

请注意,根据该定义,全球数以百万计的表已经处于 6NF 状态,但没有这种意图。例如。典型的参考表或查找表,只有 PK 和描述。

正确的。好吧,我们的朋友看看他们的 Product 表,它有 8 个非键属性,所以如果他们将 Product 表设为 6NF,他们将有 8 个子 Product 表。然后存在的问题是某些列是其他表的外键,这会导致更多的复杂性。他们注意到 SQL 不支持他们正在做的事情,他们必须建立一个小目录。八个表是正确的,但不明智。他们的目的是摆脱空值,而不是在每个表周围编写一个小子系统。

简单 6NF 示例

不熟悉关系数据库建模标准的读者可能会发现< strong>IDEF1X 表示法 对于解释示例中的符号很有用。

因此,通常情况下,产品表会保留所有强制列,尤其是 FK,而每个可选列、每个可空列都放置在单独的子产品表中。这是我见过的最简单的形式。五张桌子而不是八张。在Model中,四个子Product表是“in 6NF”;主产品表是“in 5NF”。

现在,我们确实不需要从 Product 中进行 SELECT 的每个代码段都必须根据 ProductType 等确定它应该构造哪些列,因此我们提供了一个 View,它本质上提供了 Product 表簇的 5NF“视图” 。

我们需要的下一件事是 SQL 目录扩展的基本基础,以便我们可以确保各种 ProductType 的规则(数据完整性)维护在数据库中的一个位置,而不依赖于应用程序代码。您可以使用的最简单的目录。这是由 ProductType 驱动的,因此 ProductType 现在构成该元数据的一部分。您可以在没有目录的情况下实现这个简单的结构,但我不推荐它。

更新

值得注意的是,我在数据库中实现了所有业务规则。否则它就不是一个数据库(“在应用程序代码中”实现规则的概念非常搞笑,尤其是现在,当我们有花店作为“开发人员”工作时)。因此,所有规则等首先实现为 SQL 声明、CHECK 约束、函数等。这保留了所有声明性引用完整性和声明性数据完整性。 SQL 目录的扩展涵盖了 SQL 没有声明的区域,然后将它们实现为 SQL。作为一个优秀的数据字典,它的作用远不止于此。例如。每次更改表或添加或更改列或其特征时,我不会编写视图,它们是使用简单的代码生成器直接从目录+扩展创建的。

还有一个非常重要的注释。如果没有完成完整而忠实的标准化练习,您就无法实施 6NF(或正确的 EAV)至 5NF。我在每个站点看到的问题是,他们没有真正的 5NF 状态,他们有部分标准化的混杂或根本没有标准化,但他们对此非常执着。由此创建 6NF 或 EAV 都是一场灾难。如果没有在声明性 SQL 中实现所有业务规则,则从中创建 EAV 或 6NF 是一场核灾难,会持续多年。一分钱一分货。

结束更新。

最后,是的,至少有四个进一步的规范化级别(规范化是一个原则,而不仅仅是对范式的引用),可以应用于简单的 6NF 产品集群,提供更多的控制、更少的表等。我们越深入,目录就越广泛。以及更高水平的性能。当您准备好时,只需询问即可,我已经建立了模型并在其他答案中发布了详细信息。

I actually started putting an answer together, but I ran into complications, because you (quite understandably) want a simple example. The problem is manifold.

First I don't have a good idea of your level of actual expertise re Relational Databases and 5NF; I don't have a starting point to take up and then discuss the specifics of 6NF,

Second, just like any of the other NFs, it is variegated. You can just barely step into it; you can implement 6NF for certan tables; you can go the full hog on every table, etc. Sure there is an explosion of tables, but then you Normalise that, and kill the explosion; that's an advanced or mature implementation of 6NF. No use providing the full or partial levels of 6NF, when you are asking for the simplest, most straight-forward example.

I trust you understand that some tables can be "in 5NF" while others are "in 6NF".

So I put one together for you. But even that needs explanation.

Now SQL barely supports 5NF, it does not support 6NF at all (I think dportas says the same thing in different words). Now I implement 6NF at a deep level, for performance reasons, simplified pivoting (of entire tables; any and all columns, not the silly PIVOT function in MS), columnar access, etc. For that you need a full catalogue, which is an extension to the SQL catalogue, to support the 6NF that SQL does not support, and maintain data Integrity and business Rules. So, you really do not want to implement 6NF for fun, you only do that if you have a need, because you have to implement a catalogue. (This is what the EAV crowd do not do, and this is why most EAV systems have data integrity problems. Most of them do not use the declarative Referential & Data Integrity that SQL does have.)

But most people who implement 6NF don't implement the deeper level, with a full catalogue. They have simpler needs, and thus implement a shallower level of 6NF. So, let's take that, to provide a simple example for you. Let's start with an ordinary Product table that is declared to be in 5NF (and let's not argue about what 5NF is). The company sells various different kinds of Products, half the columns are mandatory, and the other half are optional, meaning that, depending on the Product Type, certain columns may be Null. While they may have done a good job with the database, the Nulls are now a big problem: columns that should be Not Null for certain ProductTypes are Null, because the declaration states NULL, and their app code is only as good as the next guy's.

So they decide to go with 6NF to fix that problem, because the subtitle of 6NF states that it eliminates The Null Problem. Sixth Normal Form is the irreducible Normal Form, there will be no further NFs after this, because the data cannot be Normalised further. The rows have been Normalised to the utmost degree. The definition of 6NF is:

a table is in 6NF when the row contains the Primary Key, and at most one, attribute.

Notice that by that definition, millions of tables across the planet are already in 6NF, without having had that intent. Eg. typical Reference or Look-up tables, with just a PK and Description.

Right. Well, our friends look at their Product table, which has eight non-key attributes, so if they make the Product table 6NF, they will have eight sub-Product tables. Then there is the issue that some columns are Foreign Keys to other tables, and that leads to more complications. And they note the fact that SQL does not support what they are doing, and they have to build a small catalogue. Eight tables are correct, but not sensible. Their purpose was to get rid of Nulls, not to write a little subsytem around each table.

Simple 6NF Example

Readers who are unfamiliar with the Standard for Modelling Relational Databases may find IDEF1X Notation useful in order to interpret the symbols in the example.

So typically, the Product Table retains all the Mandatory columns, especially the FKs, and each Optional column, each Nullable column, is placed in a separate sub-Product table. That is the simplest form I have seen. Five tables instead of eight. In the Model, the four sub-Product tables are "in 6NF"; the main Product table is "in 5NF".

Now we really do not need every code segment that SELECTs from Product to have to figure out what columns it should construct, based on the ProductType, etc, so we supply a View, which essentially provides the 5NF "view" of the Product table cluster.

The next thing we need is the basic rudiments of an extension to the SQL catalog, so that we can ensure that the rules (data integrity) for the various ProductTypes are maintained in one place, in the database, and not dependent on app code. The simplest catalogue you can get away with. That is driven off ProductType, so ProductType now forms part of that Metadata. You can implement that simple structure without a catalogue, but I would not recommend it.

Update

It is important to note that I implement all Business Rules in the database. Otherwise it is not a database (the notion of implementing rules "in application code" is hilarious in the extreme, especially nowadays, when we have florists working as "developers"). Therefore all rules, etc are first and foremost implemented as SQL declarations, CHECK constraints, functions, etc. That preserves all Declarative Referential Integrity, and declarative Data Integrity. The extension to the SQL catalog covers the area that SQL does not have declarations for, and they are then implemented as SQL. Being a good data dictionary, it does much more. Eg. I do not write Views every time I change the tables or add or change columns or their characteristics, they are created directly from the catalog+extension using a simple code generator.

One more very important note. You cannot implement 6NF (or EAV properly, for that matter), without completing a full and faithful Normalisation exercise, to 5NF. The problem I see at every site is, they don't have a genuine 5NF state, they have a mish-mash of partial normalisation or no normalisation at all, but they are very attached to that. Creating either 6NF or EAV from that is a disaster. Creating EAV or 6NF from that without all business rules implemented in declarative SQL is a nuclear disaster, burning for years. You get what you pay for.

End update.

Finally, yes, there are at least four further levels of Normalisation (Normalisation is a Principle, not a mere reference to a Normal Form), that can be applied to that simple 6NF Product cluster, providing more control, less tables, etc. The deeper we go, the more extensive the catalogue. And higher levels of performance. When you are ready, just ask, I have already erected the models and posted details in other answers.

浮云落日 2024-10-21 20:27:02

简而言之,6NF 意味着每个关系都由一个候选键加上不超过一个其他(键或非键)属性组成。举个例子,如果一个“item”由 ProductCode 标识,其他属性是 Description 和 Price,那么 6NF 模式将由两个关系组成(* 表示每个关系中的键):

ItemDesc {ProductCode*, Description}
ItemPrice {ProductCode*, Price}

这可能是一种非常灵活的方法,因为它最大限度地减少了依赖性。但这也是它的主要缺点,尤其是在 SQL 数据库中。 SQL 使得强制执行许多多表约束变得困难或不可能。使用上述模式,在大多数情况下,不可能强制执行每个产品必须始终具有描述和价格的业务规则。同样,您可能无法强制执行一些应该应用的复合键(因为它们的属性可以拆分到多个表中)。

因此,在考虑 6NF 时,您必须权衡哪些依赖性和完整性规则对您来说很重要。在许多情况下,您可能会发现坚持 5NF 并不再标准化更加实际和有用。

In a nutshell, 6NF means that every relation consists of a candidate key plus no more than one other (key or non-key) attribute. To take up your example, if an "item" is identified by a ProductCode and the other attributes are Description and Price then a 6NF schema would consist of two relations (* denotes the key in each):

ItemDesc {ProductCode*, Description}
ItemPrice {ProductCode*, Price}

This is potentially a very flexible approach because it minimises the dependencies. That's also its main disadvantage however, especially in a SQL database. SQL makes it hard or impossible to enforce many multi-table constraints. Using the above schema, in most cases it will not be possible to enforce a business rule that every product must always have a description AND a price. Similarly, you may not be able to enforce some compound keys that ought to apply (because their attributes could be split over multiple tables).

So in considering 6NF you have to weigh up which dependencies and integrity rules are important to you. In many cases you may find it more practical and useful to stick to 5NF and normalize no further than that.

太阳哥哥 2024-10-21 20:27:02

我之前对 6NF 持怀疑态度
因为它被表述为“仅仅”
粘贴一些时间戳列
表。

我不太确定这种明显的误解从何而来。也许是因为 Date、Darwen 和 Lorentzos 所著的《时态数据和关系模式》一书中引入了 6NF?无论如何,我希望这里的其他答案已经澄清了 6NF 不仅限于时态数据库。

我想说的一点是,虽然 6NF 是“学术上值得尊敬的”并且总是可以实现的,但它不一定会在每种情况下带来最佳设计(并且不仅仅是在考虑使用 SQL 实现时)。甚至上述 6NF 的发现者和支持者似乎也同意,例如

Chris Date:“对于出于实际目的,请坚持 5NF(和 6NF)。”

Hugh Darwen:“围绕 Date [而不是人!] 的 6NF 分解将是杀伤力太大了​​……足球俱乐部的最佳设计是……5-一点点-NF!”

Hugh Darwen:“我们处于 5NF,但不是在 6NF 中,5NF 就足够了”(几个类似的例子)。

话又说回来,我也可以找到相反的证据:

Chris Date:“Darwen一段时间以来,我都认为所有基础相关变量都应该在 6NF 中”。

实际上,我最近扩展了我们的一个产品的 SQL 架构,添加了一个小功能。我采用了 6NF 来避免可为空列,最终得到了 6 个新表,其中大多数(全部?)我的同事都会使用一张带有可为空列的表(或者可能扩展现有表)。尽管我证明了几个“帮助程序”存储过程和带有 INSTEAD OF 触发器的“非规范化”VIEW,但每个必须在 SQL 级别使用此功能的编码器都他们不遗余力地诅咒我:)

I had previously been skeptical of 6NF
as it was presented as "merely"
sticking some timestamp columns on
tables.

I'm not quite sure where this apparent misconception comes from. Perhaps the fact that 6NF was introduced for the book "Temporal Data and The Relational Mode" by Date, Darwen and Lorentzos? Anyhow, I hope the other answers here have clarified that 6NF is not limited to temporal databases.

The point I wanted to make is, although 6NF is "academically respectable" and always achievable, it may not necessarily lead to the optimal design in every case (and not just when considering implementation using SQL either). Even the aforementioned discoverers and proponents of 6NF seem to agree e.g.

Chris Date: "For practical purposes, stick to 5NF (and 6NF)."

Hugh Darwen: "the 6NF decomposition around Date [not the person!] would be overkill... an optimal design for the soccer club is... 5-and-a-bit-NF!"

Hugh Darwen: "we are in 5NF but not in 6NF, and again 5NF is sufficient" (several similar examples).

Then again, I can also find evidence to the contrary:

Chris Date: "Darwen and I have both felt for some time that all base relvars should be in 6NF".

On a practical note, I recently extended the SQL schema of one of our products to add a minor feature. I adopted a 6NF to avoid nullable columns and ended up with six new tables where most (all?) of my colleagues would have used one table (or perhaps extended an existing table) with nullable columns. Despite me proving several 'helper' stored procs and a 'denormalized' VIEW with a INSTEAD OF triggers, every coder that has had to work with this feature at the SQL level has gone out of their way to curse me :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文