NULL 和 FK 关系意味着什么 - 数据库
我在关系 SQL 数据库中创建 FK 关系时遇到了困难,经过工作中的简短讨论,我们意识到我们有可为空的列,这很可能是导致该问题的原因。 我一直认为 NULL 意味着未分配、未指定、空白等,并且确实从未见过这样的问题。
与我交谈的其他开发人员认为,处理这种情况的唯一方法是,如果两个实体之间确实存在关系,那么您必须创建一个表来连接两个实体的数据......
对我来说,这似乎很直观至少可以说,对于包含来自另一个表的 ID 的列,如果该列不为空,则它必须具有来自另一个表的 ID,但如果它为 NULL,则可以继续。 这本身似乎与某些人的说法和建议相矛盾。
处理两个表之间可能存在关系的情况的最佳实践或正确方法是什么,如果指定了一个值,那么它必须在另一个表中......
I was experiencing a hard time creating FK relationships in my relational SQL database and after a brief discussion at work, we realized that we have nullable columns which were most likely contributing to the problem. I have always viewed NULL as meaning unassigned, not specified, blank, etc. and have really never seen a problem with that.
The other developers I was speaking with felt that the only way to handle a situation where if a relationship did exist between 2 entities, then you would have to create a table that joins the data from both entities...
It seems intuitive to me at least to say that for a column that contains an ID from another table, if that column is not null, then it must have an ID from the other table, but if it is NULL then that is OK and move on. It seems like this in itself is contradictory to what some say and suggest.
What is the best practice or correct way to handle situations where there could be a relationship between two tables and if a value is specified then it must be in the other table...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
这是完全可以接受的,这意味着,如果该列有任何值,那么它的值必须存在于另一个表中。 (我看到其他答案另有说法,但我不敢苟同。)
想象一个车辆和引擎表,引擎尚未安装在车辆中(因此车辆 ID 为空)。 或者是带有主管列和公司首席执行官的员工表。
更新:根据 Solberg 的请求,这里是具有外键关系的两个表的示例,显示外键字段值可以为空。
-- 现在一些测试:
It's perfectly acceptable, and it means that, if that column has any value, its value must exist in another table. (I see other answers asserting otherwise, but I beg to differ.)
Think a table of Vehicles and Engines, and the Engines aren't installed in a Vehicle yet (so VehicleID is null). Or an Employee table with a Supervisor column and the CEO of the company.
Update: Per Solberg's request, here is an example of two tables that have a foreign key relationship showing that the foreign key field value can be null.
-- Now some tests:
我认为这场辩论是对象关系阻抗不匹配的另一个副产品。 一些 DBA 类型会迂腐地说,基于对关系代数语义的更深入理解,FK 中永远不允许 null,但应用程序开发人员会认为这使他们的领域层更加优雅。
“尚未建立”关系的用例是有效的,但对于空 FK,一些人发现它通过引入更复杂的 SQL 功能(特别是 LEFT JOIN)增加了查询的复杂性。
我见过的一种常见的替代解决方案是在每个表中引入一个“空行”或“哨兵行”,其中 pk=0 或 pk=1(基于 RDBMS 支持的内容)。 这允许您设计一个具有“尚未建立”关系的域层,但也避免引入 LEFT JOIN,因为您保证总会有一些东西可以加入。
当然,这种方法也需要勤奋,因为您基本上是在权衡 LEFT JOIN ,以便必须检查查询中哨兵行的存在,这样您就不会更新/删除它,等等。权衡是否合理是另一件事。 我倾向于同意,仅仅为了避免更花哨的连接而重新发明 null 似乎有点愚蠢,但我也在一个应用程序开发人员无法赢得与 DBA 辩论的环境中工作。
编辑
我删除了一些“事实上”的措辞,并试图澄清“失败”连接的含义。 @wcoenen 的例子是我个人最常听到的避免 null FK 的原因。 这并不是说它们像“破碎”那样失败了,而是失败了——有些人会认为——未能遵守最小意外原则。
另外,我把这个回复变成了一个维基,因为我基本上已经把它从原来的状态中删除了,并从其他帖子中借用了。
I think this debate is another byproduct of the object-relational impedence mismatch. Some DBA-types will pedantically say never allow null in a FK based on some deeper understanding of relational algebra semantics, but application developers will argue that it makes their domain layer more elegant.
The use cases for a "not yet established" relationship are valid, but with null FKs some find that it adds complexity to their queries by introducing more sophisticated features of SQL, specifically LEFT JOINs.
One common alternative solution I've seen is to introduce a "null row" or "sentinel row" into each table with pk=0 or pk=1 (based on what's supported by your RDBMS). This allows you to design a domain layer with "not yet established" relationships, but also avoid introducing LEFT JOINs as you're guaranteeing there will always be something to join against.
Of course, this approach requires diligence too because you're basically trading off LEFT JOINs for having to check the presence of your sentinel row in queries so you don't update/delete it, etc. Whether or not the trade offs are justified is another thing. I tend to agree that reinventing null just to avoid a fancier join seems a bit silly, but I also worked in an environment where application developers don't win debates against DBAs.
Edits
I removed some of the "matter of fact" wording and tried to clarify what I meant by "failing" joins. @wcoenen's example is the reason that I've personally heard most often for avoiding null FKs. It's not that they fail as in "broken", but rather fail--some would argue--to adhere to the principle of least surprise.
Also, I turned this response into a wiki since I've essentially butchered it from its original state and borrowed from other posts.
我强烈支持在 OLTP 系统中使用外键中的 NULL 来指示无父项的论点,但在决策支持系统中它很少能很好地工作。 最合适的做法是使用特殊的“不适用”(或类似)值作为子记录(在事实表中)可以链接到的父项(在维度表中)。
原因是,向下钻取/交叉等的探索性质可能会导致用户在仅仅要求提供更多信息时不理解指标如何变化。 例如,当财务数据集市包含产品销售和其他收入来源的组合时,深入到“产品类型”应该将非产品销售相关数据分类,而不是让这些数字从报告中删除,因为事实表和产品维度表之间没有连接。
I'm strongly supportive of the arguments for NULLs in foreign keys to indicate no-parent in an OLTP system, but in a decision support system it rarely works well. There the most appropriate practice is to use a special "Not Applicable" (or similar) value as the parent (in the dimenson table) to which the child records (in the fact table) can link.
The reason for this is that the exploratory nature of drill-down/across etc can lead to users not understanding how a metric can change when they have merely asked for more information on it. For example where a finance data mart includes a mix of product sales and other sources of revenue, drilling down to "Type of Product" ought to classify non-product sale related data as such, rather than letting those numbers drop out of the report because there is no join from the fact table to the product dimension table.
当外键是复合外键时,会出现在外键列中允许空值的问题。 如果两列之一为空,这意味着什么? 另一列是否必须与引用表中的任何内容匹配? 通过简单的(单列)外键约束,您可以摆脱空值。
另一方面,如果两个表之间的关系是有条件的(两个实体都可以单独存在,但可能几乎巧合地相关),那么最好使用“连接表”(包含一个表)来对其进行建模。 FK 到引用表,另一个到引用表,并且具有自己的主键作为两个 FK 的组合。
作为连接表的示例,假设您的数据库包含俱乐部和人员表。 有些人属于某些俱乐部。 连接表将是club_members,并且将包含引用“people”表的人员的FK,并且将包含该人员所属俱乐部的另一个FK,并且人员和俱乐部的标识符的组合将是主键连接表。 (连接表的另一个名称是“关联”或“关联”表。)
The problem with allowing nulls in foreign key columns arises when the foreign key is composite. What does it mean if one of the two columns is null? Does the other column have to match anything in the referenced table? With simple (single-column) foreign key constraints, you can get away with nulls.
On the other hand, if the relationship between the two tables is conditional (both entities can exist in their own right, but may almost coincidentally be related) then it may be best to model that with a 'joining table' - table that contains a FK to the referenced table and another to the referencing table and that has its own primary key as the combination of two FKs.
As an example of a joining table, suppose your database has tables of clubs and people. Some of the people belong to some of the clubs. The joining table would be club_members and would contain an FK for the person referencing the 'people' table, and would contain another FK for the club that the person belongs to, and the combination of identifiers for person and club would be the primary key of the joining table. (Another name for joining table is 'association' or 'associative' table.)
我倾向于传达该专栏含义的设计。 就域而言,空值可能意味着任意数量的事物。 在相关表中放置一个表示“不需要”或“未选择”的值至少可以传达目的,而无需询问开发人员或查阅文档。
I would lean toward a design that communicates the meaning of that column. A null could mean any number of things as far as the domain is concerned. Putting a value in the related table that says "Not Needed", or "Not Selected" at least communicates the purpose without having to ask a developer or consult a document.
假设您需要生成所有客户的报告。 每个客户都有一个国家/地区的 FK,并且国家/地区数据需要包含在报告中。 现在假设您允许 FK 为
null
,并且执行以下查询:任何国家/地区 FK 为
null
的客户都将从报告中默默忽略(您需要使用 LEFT JOIN 来修复它)。 我发现这不直观且令人惊讶,因此我不喜欢 NULL FK,并在我的数据库模式中避免使用它们。 相反,我使用哨兵值,例如特殊的“未知国家”。Suppose you would need to generate a report of all customers. Each customer has a FK to a country and the country data needs to be included in the report. Now suppose you allow the FK to be
null
, and you do the following query:Any customer where the country FK is
null
would be silently omitted from the report (you need to use LEFT JOIN instead to fix it). I find this unintuitive and surprising, so I don't like NULL FKs and avoid them in my database schemas. Instead I use sentinel values, e.g. a special "unkown country".这并没有什么问题! 根节点将永远有一个 NULL 父节点,这不是“尚未建立”关系的情况。 这里的连接也没有问题。
让根节点指向自身作为父节点以避免 NULL FK 或任何其他创造性的解决方法,意味着现实世界不再在数据库中准确建模。
没有人提到的一个潜在问题是包含大量 NULL 值的列上的索引性能。 虽然这本身与外键问题无关,但它可能会使连接表现不佳。
我确实明白,如果您是一名 DBA,正在处理拥有数亿行的超大型数据库,您将不会需要 NULL 外键,因为它们根本无法执行。 但事实是,大多数开发人员一生中永远不会使用如此大的数据库,而今天的数据库可以处理几十万行的这种情况。 强调一个(糟糕的)比喻,我们大多数人都不会驾驶 F1 赛车,而我妻子的雅阁中的自动变速箱可以很好地完成它需要做的事情(或者至少,它曾经是这样,直到几周前它坏了) ...)。
There is nothing wrong with this! The root node will eternally have a NULL parent, and this is not a case of a "not yet established" relationship. No problem with joins here, either.
Having the root node point to itself as the parent to avoid the NULL FK, or any other creative workaround, means that the real world is no longer accurately modeled in the database.
The one potential issue that nobody mentioned is with index performance on columns that contain lots of NULL values. This per se has nothing to do with the foreign key question, though, but it can make joins perform poorly.
I do understand that if you are a DBA working with ultra-large databases that have hundreds of millions of rows, you would not want NULL foreign keys, because they would simply not perform. The truth is, though, that most developers will never work with such large databases in their lifetime, and today's databases can handle such a situation just fine with a few hundred thousand rows. To stress a (poor) metaphor, most of us so not drive F1 race cars, and the automatic transmission in my wife's Accord does what it needs to do just fine (or at least, it used to, until it broke a few weeks ago ...).
如果您将 NULL 分配给业务原因,那么您实际上是在重新定义 NULL 在您的域中的含义,并且必须为用户和未来的开发人员记录该含义。 如果存在将 NULL 作为外键的业务原因,那么我建议您按照其他人提到的那样进行操作,并添加一条具有“N/A”或“未分配”行值的连接记录。
此外,当数据库中的 NULL 现在变得具有多种含义(业务含义、某些错误或未正确输入)时,可能会出现复杂情况,这可能会导致问题更难以追踪。
If you are assigning NULL to a Business Reason then you are essentially redefining what NULL means in your domain and must document that for users and future developers. If there is a Business Reason for having NULL as a foreign key then I would suggest you do as others have mentioned and add a joining record that has a value of something along the lines of 'N/A' or 'Not Assigned'.
Also there could be complications when NULL in your database now becomes multiple meanings (Business Meaning, Something Error'd or Wasn't inputed correctly) which can cause issues to be more difficult to track down.
如果字段可以为空,我不认为空值有问题。 滥用行为是在该字段中应该有信息时允许空值。
I don't see a problem with null values if the field can be empty. An abuse is allowing null values when there should be information in that field.
你做对了。 对于 FK,NULL 意味着没有值(意味着没有关系)。 如果 FK 中有一个值,它必须与其引用的 PK 中的一个值完全匹配。
允许这种情况并不一定是糟糕的设计。 如果关系是一对多且可选的,则完全可以在一侧的表中添加 FK,引用多侧的 PK。
如果关系是多对多,则需要一个自己的表,称为联结表。 该表有两个 FK,每个 FK 引用相关表之一中的一个 PK。 在这种情况下,可以通过简单地从连接表中省略整行来表达省略关系。
有些人的设计是为了避免允许 NULL 的必要性。 这些人将使用联结表来建立多对一关系,并在省略关系时省略一行,如上所述。
我自己并不遵循这种做法,但它确实有一定的好处。
You got it right. For an FK a NULL means no value (meaning no relationship). If there is a value in an FK it has to match exactly one value in the PK that it references.
It is not necessarily bad design to permit this. If a relationship is one-to-many and optional, it's perfectly OK to add a FK to the table on the one side, referencing the PK on the many side.
If a relationship is many-to-many it requires a table of its own, called a junction table. This table has two FKs, each referencing a PK in one of the tables being related. In this case an omitted relationship can be expressed by simply omitting an entire row from the junction table.
Some people design so as to avoid the necessity of permitting NULLS. These people will use a junction table for a many-to-one relationship, and omit a row, as above, when a relationship is omitted.
I don't follow this practice myself, but it does have certain benefits.
我不得不说,尽管这显然是可能的,但根据 Jonathon Leffler 的精心阐述的观点,使用连接表有什么问题呢?
我提出这个问题是因为我有完全相同的需求,但我的设计现在通过连接表明显“干净”。 我的数据库图表现在清楚地表明我的字段是可选的,从模式 POV 来看这对我来说效果很好。
然后,为了简化查询,我只是创建了一个视图,将两个表左连接在一起,这给出了可选连接的外观,但实际上使用了更清晰的数据库结构。 在我看来,使用 ISNULL(MyField, 'None') 我可以提供“不存在”附加行设计的好处,但不会带来痛苦。
考虑到这里提到的几点,我同意 DBA 的观点——当您可以拥有更“可靠”的关系并且更容易与视图一起使用时,为什么要使用空列呢? 而且也不需要真正的额外努力。
I would have to say that even though it is clearly possible, what is the problem with using a joining table as per Jonathon Leffler's well made point?
I came upon this question because I had exactly the same need but my design is now significantly "cleaner" with a joining table. My database diagram now clearly shows me that my field is optional which works well for me from a schema POV.
Then to simplify my queries, I just made a view LEFT JOINing the two tables together which gives the appearance of an optional join but actually uses the clearer database structure. Also using ISNULL(MyField, 'None') in my view I can provide the benefits of the "not present" additional row design but without the pain.
Given the points mentioned here, I'm with DBA's on this one - why have a null column when you can have a more "solid" relationship made easier to use with a view? And for no real extra effort either.
连接表是正确的方法。
键中的空值表明数据库设计不当。
空值是不是未分配/空/空白/等,它是丢失/未知的数据。
在外键字段中使用空值并不意味着“没有关系”,它意味着“我不知道是否存在关系”——这显然是不好的。
The join table is the correct method.
Nulls in keys indicate bad database design.
A null value is not unassigned/empty/blank/etc, it is missing/unknown data.
Using nulls in a foreign key field does not mean "there's no relation", it means "I don't know if there's a relation or not" - which is clearly bad.