是否添加“ID”?字段到数据库表按照第三范式算错误吗?
是否可以将“ID”字段作为主键添加到我的所有数据库表中并使用它来建立表之间的关系? 此设计是否会被视为 3NF(第三范式)设计? 如果是的话,理论上是否推荐这个东西?
Is it OK to add "ID" field as primary key to all my database tables also to use it to make relations ships between tables?
Is this design would consider as 3NF (third normal form) design?
If yes, is this thing recommended theoretically or not?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
问题是这个问题有点孤立。但由于您担心该问题是否有理论(可能还有标准合规性)基础,因此答案必须不如此孤立。
如果是,这个东西在理论上是否值得推荐?
不。它没有任何学术或理论基础。它打破了基本的关系数据库设计规则,因此(a)不会生成关系数据库,并且(b)无论生成什么,都不会具有用户(无需通过应用程序)所期望的关系功能的强大功能有许多简单的关系数据库报告工具。
事实上,不幸的是,这是一种非常常见、下意识的、快速而肮脏的方法,用于将电子表格(应用程序开发人员已为其应用程序识别)放入“数据库”容器,例如微软SQL。 无需进行容器内容作为关系数据库所需的任何真正的数据库设计或建模工作。适合获得原型或概念验证,但尚未准备好进行任何形式的开发(SQL 编码)。
可以将“ID”字段作为主键添加到我的所有数据库表中并使用它来建立表之间的关系吗?
等等。根据定义,它们不能是“数据库表”。数据库表是通过正式的建模过程得出的,因此将具有强大的标识符。并且关系已经定义。在这种情况下就不会提出这个问题。因此,既然被问到了,那么您所问的事情与“数据库表”相差甚远。它们只是一款应用程序的应用程序开发者便笺簿。
向一个电子表格添加 FK 约束,将其“关联”到另一个电子表格,并添加“ID”PK,不会创建“关系”“数据库”。不,它只是使用 SQL 的功能来关联容器中其他不相关的电子表格。它们仍然是不相关的电子表格,通过添加“ID”列进行“链接”。
结果是数据大量重复;更新异常;更多指数;更大的“关系”数据集;表现不佳;大量过度使用临时表;复杂的SQL,所有这些都可以通过真正的数据库设计来避免。
此设计是否会被视为 3NF(第三范式)设计?
规范化是数据库设计过程的一部分(不是全部)。 3NF 就是通过这个过程实现的。 3NF 或任何 NF 不是可以在不经过整个过程的情况下放置在电子表格集或容器的部分设计内容上的标签,从而获得徽章。人们不会“考虑”一堆电子表格或部分设计的内容 3NF;评估是否遵循规范化规则,如果没有违反规则,则公平地标记为 3NF。由于尚未遵循规范化过程,因此没有基础表明它可能与任何范式相关。
同样,除了规范化之外,如果在此过程中遵循关系数据库规则且没有违反,则可以实现对关系数据库标准的遵守。由于尚未遵循关系数据库方法,因此没有依据表明它可能与任何关系数据库标准相关,或者可以期望它具有任何关系功能。
理解整个问题
“ID”是代理键。代理键始终(你是对的)一个附加键和索引,除了预先存在的 PK 之外,即将被篡夺。当然,每次访问都会产生相当大的性能成本。
一些提问者认为可以使用代理键来替代 PK。这当然是错误的,你意识到这一点,非常感激,这不必在这里讨论。
“所有代理键”或“无代理键”的概念是一种非黑即白、全有或全无的胡言乱语,这对于儿童来说很正常,但对于成年人来说是不可接受的,特别是任何从事 IT 工作的人。需要精确和理解。对于一个小孩子来说,认为“如果爸爸不让我做我想做的事,他就不爱我”是很正常的,因此“如果他不爱我”我,他恨我”。我们大多数人都意识到,生活比上小学时要复杂一点。 “喜欢”在每个表上看到“ID”并且“不喜欢”在某些表上缺少“ID”的开发人员,从定义上讲,他们无法将数据库作为一个整体来考虑,也无法考虑其他开发人员和用户的需求;他们只考虑简单的、一次一张表的代码。
它也与灰色阴影或模糊定义无关。不,这些定义30年来没有改变(它们已经被扩展并且变得更加精确,但它们没有改变)。灰色阴影使开发人员可以规避合规性和标准。所以也不建议这样做。
什么是真正的关系数据库?
事实是,如果一个数据库是由合格的数据建模者使用已经存在 30 年的方法诚实地建模和设计的,那么他们最终会得到一个真正的关系数据库。如果他们不遵循该流程,那么它既不是关系型也不是数据库。标识符和关系已经被定义,其含义、上下文将被传递到各个表中。数据将被归一化为 3NF 或 BCNF 或 5NF,并且不会出现更新异常。在最后一步中,作为正式流程的一部分,而不是在其之外,当将逻辑转换为物理时,建模者可以提高某些方面的性能 通过添加代理键来标识符,并避免将大(宽)键携带到相关子表中 (1)。这从另一个角度再次证明了为什么零代理或所有代理的概念是幼稚的并且完全脱离了真实的过程。
真正的关系数据库将具有完整的关系功能,诚实地实现 3NF,使用自然的关系键,其中一定有一些被深思熟虑地切换为代理。
轻松证明
当然,我所说的一切都可以轻松证明:只需发布 5 到 10 个电子表格的 DDL,我需要至少 4 的“深度”(great.grand.parent⇢grand .父⇢父⇢子)。
您可能感兴趣,我最近在 相关问题,我在此不再重复。
注意
这只是必需的,因为当前的 SQL 产品不支持完整的关系模型,并且为了消除它们所具有的已知性能障碍。如果供应商提供的关系数据库中的宽关系键和窄关系键的性能一样,则没有必要这样做。
我同意Erwin关于密钥和标识符的陈述,因此我没有在我的回答中重复它们。
The problem is the question is a bit isolated. But since you are concerned about whether the issue has a theoretical (and possibly a standards-compliance) basis, the answer has to not be so isolated.
If yes, is this thing recommended theoretically or not?
No. It has no academic or theoretical basis whatsoever. It breaks basic Relational Database design rules, and therefore (a) will not produce a Relational database and (b) whatever is produced, will not have the power of relational capability that the users (without having to go through the app) will expect via the many simple Relational database report tools out there.
In fact, it is the unfortunately very common, knee-jerk, quick-and-dirty method of making spreadsheets (which the app developer has identified for their app) fit into a "database" container such as MS SQL. Without doing any of the genuine database design or modelling work that is required for the contents of the container to qualify as a Relational database. Good for getting a prototype or proof of concept going but not ready for any form of development (SQL coding).
Is it OK to add "ID" field as primary key to all my database tables also to use it to make relationships between tables?
Hang on. They cannot be "database tables", by definition. Database tables are arrived at by a formal modelling process, and as a result will have strong Identifiers. And Relations already defined. In which case the question would not be asked. Therefore, since it is being asked, the things you are asking about are nowhere near "database tables". They are just one app developers scratchpad for one app.
Adding an FK constraint to one spreadsheet, to "relate" it to another spreadsheet, and adding an "ID" PK, does not make a "relational" "database". No, it merely uses the capability of SQL to relate otherwise unrelated spreadsheets in the container. They remain unrelated spreadsheets, "linked" by an added "ID" column.
The result is, substantial duplication of the data; update anomalies; many more indices; larger "relational" data sets; poor performance; massive over-use of temp tables; complex SQL, all of which can be avoided by genuine database design.
Is this design would consider as 3NF (third normal form) design?
Normalisation is part of (not all of) the database design process. 3NF is arrived at through that process. 3NF or whatever NF, is not a label that can be placed on the set of spreadsheets or partially designed contents of the container, without going through the process, and thus earning the badge. One does not "consider" a bunch of spreadsheets or partly designed contents 3NF; one evaluates if the rules of Normalisation have been followed, and if the rules are not violated, then it is fairly labelled 3NF. Since the Normalisation process has not been followed, there is no basis to suggest that it could possibly relate to any Normal Form.
Likewise, over and above Normalisation, if Relational Database rules have been followed during the process, and not violated, one achieves compliance with Relational Database standards. Since Relational Database methodology has not been followed, there is no basis to suggest that it could possibly relate to any Relational Database Standard, or that any Relational capability can be expected from it.
Understanding the Whole Issue
"IDs" are surrogate keys. Surrogate keys are always (you are right) an additional key and Index, additional to the pre-existing PK, which is about to be usurped. Of course, that has a considerable performance cost, on every access.
Some questioners have the idea that Surrogate keys can be used in substitution of the PK. Which of course, is false, and you realise that, so gratefully that does not have to be addressed here.
The notion of "all surrogate keys" or "no surrogate keys" is the kind of black-or-white, all-or-nothing, nonsense that is normal for children but unacceptable in full grown adults, specially anyone doing IT work, which requires precision and understanding. It is quie normal for a small child to think "if daddy doesn't let me do what I want, he doesn't love me", and therefore "if he doesn't love me, he hates me". Most of us realise that life is a tiny bit more complex than that by the age of primary school. Developers who "like" to see" IDs" on every table and "dislike" the lack of them on some tables, are by definition incapable of considering the database as a whole, and the needs of other developers and users; they are only thinking about simplistic, one-table-at-a-time code.
It is also not about shades of grey, or blurred definitions. No, the definitions have not changed in 30 years (they have been extended and made more precise, but they have not changed). Shades of grey allows developers to avoid compliance and standards. So that too, is not recommended.
What is a genuine Relational Database ?
The truth is, if a database was honestly modelled and designed, by a qualified data modeller, using the methodologies that have been available for 30 years, they would end up with a genuinely Relational database. And if they did not follow the process, it would neither Relational nor a Database. The Identifiers and Relations would already be defined, and the meaning, the context, of that would be carried through to various tables. The data would be Normalised, to 3NF or BCNF or 5NF, and there would be no update Anomalies. In the last step, as part of the formal process, and not outside it, when translating the logical to the physical, the modeller may improve the performance of some Identifiers by adding Surrogate keys, and avoiding carrying large (wide) keys into the related child tables (1). That proves again, from another approach, why the notion of zero surrogates, or all surrogates, is childish and completely removed from the genuine process.
The genuine Relational database will have full Relational capability, honest achievement of 3NF, use natural Relational keys, with a definite few being thoughtfully switched to Surrogates.
Easily Proved
Of course, everything I have stated can be proved easily: simply post DDL of 5 to 10 of your spreadsheets, I need a "depth" of at least four (great.grand.parent⇢grand.parent⇢parent⇢child).
You may be interested, I have recently posted information re your question, in a related question, which I am not repeating here.
Note
This is only required because the current SQL offerings do not support the full Relational Model, and to eliminate known performance obstacles that they have. And it will not be necessary if and when suppliers provide Relational databases in which wide Relational keys perform as well as narrow ones.
I agree with Erwin's statements re keys and Identifiers, and thus I have not repeated them in my answer.
“是否可以将“ID”字段作为主键添加到我的所有数据库表中并使用它来建立表之间的关系?”
你显然是想盲目地、不假思索地到处添加代理ID。认为那是可以的就和这样做一样愚蠢。 “好的”标识符具有唯一性(否则显然它不会是一个标识符)、稳定性(它们的值很少改变)、和熟悉性(它们的值表示一些有意义的东西)等属性。用户的世界——IT系统之外的世界)。
请注意,我非常有意地使用了“标识符”一词而不是“密钥”。根据定义,键具有唯一性。因此,所有键都是充当标识符的有效候选者。您实际选择哪个键作为标识符,应该取决于某个特定键满足稳定性和熟悉度标准的程度。
自然键可能不足以满足稳定性标准(但它们的程度通常被过分强调,通常是由那些对用户方面考虑得太少而对他们自己的考虑太多的开发人员造成的。自己的问题)。系统生成的 ID 绝对肯定违反了“熟悉度”标准。
这些考虑因素应该足以证明在用一种方式换取另一种方式时,平衡应该主要走向哪种方式。
“这个设计会被认为是3NF(第三范式)设计吗?如果是的话,理论上是否推荐这个东西?”
如果您“向现有设计添加 ID 列”,则这不会影响 NF。无论您现有的设计属于哪种 NF,添加了 ID 的设计都将具有相同的 NF。
"Is it OK to add "ID" field as primary key to all my database tables also to use it to make relations ships between tables?"
You clearly intend to add surrogate ID's everywhere, blindly and without any thinking. To think that is OK is as silly as doing it. "Good" identifiers have the properties of uniqueness (otherwise it would not be much of an identifier, of obviously), stability (their values change only infrequently), and of familiarity (their values denote something meaningful in the user's world - the world outside the IT system).
Note that I used the word "identifiers" instead of "keys" very deliberately. Keys have the property of uniqueness by definition. Therefore, all keys are a valid candidate for acting as an identifier. Which key you actually choose for acting as an identifier, should depend on how much or how little some particular key also satisfies the criteria of stability and familiarity.
Natural keys might not satisfy the stability criterion enough (but the extent to which they do is usually vastly over-stressed, typically by the kind of developers who think too little about the user's side and too much about their own side of the problem). System-generated ID's violate the "familiarity" criterion with absolute certainty.
These considerations should be sufficient to prove which way the balance should mostly go when trading off one for the other.
"Is this design would consider as 3NF (third normal form) design? If yes, is this thing recommended theoretically or not?"
If you "add an ID column to an existing design", then this will not influence the NF. Whatever NF your existing design was in, the design with the ID's added will have the same NF.
范式涉及属性之间的依赖关系。如果不知道您打算在表中表示哪些依赖关系,我们无法判断它是否满足任何特定的范式。
如果您谈论的是代理键(在业务领域中没有意义的键),那么对于大多数用途来说,重要的一点是这样的键不应该是任何表的唯一键。您通常还应该有一个自然密钥(也称为业务密钥),以确保数据不重复。
Normal forms are concerned with dependencies between attributes. Without knowing what dependencies you intended to represent in your table we can't say whether it would satisfy any particular normal form.
If you are talking about a surrogate key (a key that has no meaning in the business domain) then for most purposes the important point is that such a key should not be the ONLY key of any table. You should normally have a natural key (AKA business key) as well to ensure that data is not duplicated.
如果我理解正确的话,
是的,向表中添加序列 ID 并让该 ID 作为引用该表行的主键通常是一个很好的设计问题。无论这是否违反 3NF:它不违反 3NF,但也不保证。
实际上,添加序列 ID 并在内部使用该值有其优点。其一是您可以控制 ID,而外部生成的密钥可能会被另一方突然更改。另一方面,将 ID 导出到其他方会“绑定”该密钥,因为在您这边更改它可能会影响其他人对该密钥的使用。此外,序列号通常很容易伪造,并且可能与其他人对该号码的使用发生冲突。
同样在实际的数据库设计中,3NF 或 Boyce-Codd 往往是您渴望而不是盲目遵循的理论思想。选择性非规范化是一种已知的技巧,可以通过使数据更靠近来加速某些查询。
If I understand you correctly,
Yes, adding a serial ID to a table and letting that ID be the primary key to which you refer to rows of that table is in general a good design question. Whether or not this violates 3NF: it does not violate 3NF but it does not guarantee it either.
Practically, adding a serial ID and using that value internally can have its advantages. One is that you are in control of the ID, whereas an externally generated key might be changed suddenly by another party. On the other hand, exporting an ID to other parties "binds" that key since changing it on your side may have impact on others use of that key. Also a serial number is often easy to forge and may collide with other peoples use of the number.
Also in practical DB design, 3NF or Boyce-Codd tends to be theoretical ideas you aspire to rather than blindly follow. Selective denormalization is a known trick to speed up some queries by making data sit closer.
我完全同意 @jlouis 的观点,即
From my practice I can say that using natural key is a good choice only in reference-like lookup tables, provided the key field in the real world is unique and not null and doesn't change in time. I other cases using surrogate key is much more preferable (from my point of view): it's just more convenient way to design tables in spite of all the thing 3NF or Boyce-Codd tell us.
I fully agree with @jlouis that
From my practice I can say that using natural key is a good choice only in reference-like lookup tables, provided the key field in the real world is unique and not null and doesn't change in time. I other cases using surrogate key is much more preferable (from my point of view): it's just more convenient way to design tables in spite of all the thing 3NF or Boyce-Codd tell us.