标准使用“Z”；而不是 NULL 来表示丢失的数据？

发布于 2024-11-19 04:26:23 字数 1794 浏览 3 评论 0原文

除了是否应该使用 NULL 的争论之外：我负责使用 NULL 来表示“丢失或从未输入”数据的现有数据库。它与空字符串不同，空字符串意味着“用户设置了该值，并且他们选择了‘空’”。

该项目的另一个承包商坚定地站在“NULL 对我来说不存在；我从不使用 NULL，其他人也不应该使用”这一论点。然而，令我困惑的是，由于承包商的团队确实承认“丢失/从未输入”和“故意为空或用户指示为未知”之间的区别，因此他们在整个代码和存储过程中使用单个字符“Z”来表示“缺失/从未输入”，其含义与数据库其余部分中的 NULL 相同。

尽管我们的共同客户要求对此进行更改，并且我也支持此请求，但团队将此视为比我先进得多的 DBA 中的“标准实践”；仅基于我无知的请求，他们就不愿意更改为使用 NULL。那么，谁能帮我克服我的无知呢？是否有任何标准，或一小群人，甚至 SQL 专家中的一个响亮的声音提倡使用“Z”代替 NULL？

更新

我有承包商的回复要添加。当客户要求删除特殊值以允许没有数据的列中出现 NULL 时，他是这样说的：

基本上，我设计数据库是为了尽可能避免 NULL。理由如下：
• 字符串 [VARCHAR] 字段中的 NULL 是不必要的，因为空（零长度）字符串提供完全相同的信息。
• 整数字段（例如，ID 值）中的 NULL 可以通过使用数据中永远不会出现的值（例如，整数 IDENTITY 字段的 -1）来处理。
• 日期字段中的 NULL 很容易导致日期计算变得复杂。例如，在计算日期差异的逻辑中，例如 [RecoveryDate] 和 [OnsetDate] 之间的天数差异，如果一个或两个日期均为 NULL，则逻辑将崩溃 - 除非为两个日期提供了明确的允许为 NULL。这是额外的工作和额外的处理。如果 [RecoveryDate] 和 [OnsetDate] 使用“默认”或“占位符”日期（例如“1/1/1900”），数学计算可能会显示“不寻常”的值 - 但日期逻辑不会崩溃。< /em>
NULL 处理传统上是开发人员在存储过程中犯错误的一个领域。
在我担任 DBA 的 15 年中，我发现最好尽可能避免 NULL。

这似乎证实了对这个问题的大多数负面反应。不是应用公认的 6NF 方法来设计 NULL，而是使用特殊值来“尽可能避免 NULL”。我以开放的心态发布了这个问题，我很高兴我了解了更多关于“NULL 有用/NULL 是邪恶”辩论的信息，但我现在很乐意将“特殊值”方法标记为完全无稽之谈。

空（零长度）字符串提供完全相同的信息。

不，事实并非如此；在我们正在修改的现有数据库中，NULL表示“从未输入”，空字符串表示“输入为空”。

传统上，NULL 处理是开发人员在存储过程中犯错误的一个领域。

是的，但是这些错误已经被数千名开发人员犯过数千次，并且避免这些错误的教训和注意事项是众所周知的并记录在案。正如这里提到的：无论您接受还是拒绝 NULL，缺失值的表示都是一个已解决的问题。没有必要仅仅因为开发人员不断犯易于克服（且易于识别）的错误就发明新的解决方案。

作为脚注：我担任 DBE 和开发人员已超过 20 年（这当然足以让我了解数据库工程师和数据库管理员之间的区别）。在我的整个职业生涯中，我一直站在“NULL 很有用”阵营，尽管我知道一些非常聪明的人不同意。我对“特殊值”方法非常怀疑，但对“如何以正确的方式避免 NULL”的学术不够精通，无法做出坚定的立场。我总是喜欢学习新事物——20 年后我仍然有很多东西要学。感谢所有为使本次讨论成为有益的讨论而做出贡献的人。

原文

Outside of the argument of whether or not NULLs should ever be used: I am responsible for an existing database that uses NULL to mean "missing or never entered" data. It is different from empty string, which means "a user set this value, and they selected 'empty'."

Another contractor on the project is firmly on the "NULLs do not exist for me; I never use NULL and nobody else should, either" side of the argument. However, what confuses me is that since the contractor's team DOES acknowledge the difference between "missing/never entered" and "intentionally empty or indicated by the user as unknown," they use a single character 'Z' throughout their code and stored procedures to represent "missing/never entered" with the same meaning as NULL throughout the rest of the database.

Although our shared customer has asked for this to be changed, and I have supported this request, the team cites this as "standard practice" among DBAs far more advanced than I; they are reluctant to change to use NULLs based on my ignorant request alone. So, can anyone help me overcome my ignorance? Is there any standard, or small group of individuals, or even a single loud voice among SQL experts which advocates the use of 'Z' in place of NULL?

Update

I have a response from the contractor to add. Here's what he said when the customer asked for the special values to be removed to allow NULL in columns with no data:

Basically, I designed the database to avoid NULLs whenever possible. Here is the rationale:
• A NULL in a string [VARCHAR] field is never necessary because an empty (zero-length) string furnishes exactly the same information.
• A NULL in an integer field (e.g., an ID value) can be handled by using a value that would never occur in the data (e.g, -1 for an integer IDENTITY field).
• A NULL in a date field can easily cause complications in date calculations. For example, in logic that computes date differences, such as the difference in days between a [RecoveryDate] and an [OnsetDate], the logic will blow up if one or both dates are NULL -- unless an explicit allowance is made for both dates being NULL. That's extra work and extra handling. If "default" or "placeholder" dates are used for [RecoveryDate] and [OnsetDate] (e.g., "1/1/1900") , mathematical calculations might show "unusual" values -- but date logic will not blow up.
NULL handling has traditionally been an area where developers make mistakes in stored procedures.
In my 15 years as a DBA, I've found it best to avoid NULLs wherever possible.

This seems to validate the mostly negative reaction to this question. Instead of applying an accepted 6NF approach to designing out NULLs, special values are used to "avoid NULLs wherever possible." I posted this question with an open mind, and I am glad I learned more about the "NULLs are useful / NULLs are evil" debate, but I am now quite comfortable labeling the 'special values' approach to be complete nonsense.

an empty (zero-length) string furnishes exactly the same information.

No, it doesn't; in the existing database we are modifying, NULL means "never entered" and empty string means "entered as empty".

NULL handling has traditionally been an area where developers make mistakes in stored procedures.

Yes, but those mistakes have been made thousands of times by thousands of developers, and the lessons and caveats for avoiding those mistakes are known and documented. As has been mentioned here: whether you accept or reject NULLs, representation of missing values is a solved problem. There is no need to invent a new solution just because developers continue make easy-to-overcome (and easy-to-identify) mistakes.

As a footnote: I have been a DBE and developer for more than 20 years (which is certainly enough time for me to know the difference beetween a database engineer and a database administrator). Throughout my career I have always been in the "NULLs are useful" camp, though I was aware that several very smart people disagreed. I was extremely skeptical about the "special values" approach, but not well-versed enough in the academics of "How To Avoid NULL the Right Way" to make a firm stand. I always love learning new things—and I still have lots to learn after 20 years. Thanks to all who contributed to make this a useful discussion.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

愿得七秒忆 2024-11-26 04:26:23

解雇你的承包商。

好吧，说真的，这不是标准做法。这可以简单地看出，因为我曾经使用过的所有 RDBMS 都实现了 NULL、NULL 的逻辑、在外键中考虑 NULL、在 COUNT 中对 NULL 有不同的行为等等。

我实际上认为使用“Z”或任何其他占位符更糟糕。您仍然需要代码来检查“Z”。但您还需要证明“Z”并不意味着“Z”，它意味着其他东西。并且您必须确保阅读此类文档。那么如果“Z”成为有效数据会发生什么？（例如缩写字段？）

在基本层面上，即使不争论 NULL 与“Z”的有效性，我也会坚持要求承包商遵守贵公司而不是他公司内部存在的标准做法。在具有替代标准实践的环境中建立他的标准实践将导致混乱、维护开销、误解，并最终增加成本和错误。

编辑

在我看来，在某些情况下使用 NULL 的替代方案是有效的。但只有这样做可以减少代码，而不是创建需要考虑的特殊情况。

例如，我已将其用于日期绑定数据。如果数据在开始日期和结束日期之间有效，则可以通过不使用 NULL 值来简化代码。相反，NULL 开始日期可以替换为“01 Jan 1900”，NULL 结束日期可以替换为“31 Dec 2079”。

这仍然可以改变预期的行为，因此应该谨慎使用：

WHERE end-date IS NULL不再提供仍然有效的数据
您刚刚创建了自己的千年虫
等

。相当于改革抽象，使所有属性始终具有有效值。它与将特定含义隐式编码为任意选择的值明显不同。

尽管如此，还是解雇承包商。

回复收藏 0 原文

时光沙漏 2024-11-26 04:26:23

这无疑是我听过的最奇怪的观点之一。使用魔术值来表示“无数据”而不是 NULL 意味着您拥有的每段代码都必须对结果进行后处理以考虑/丢弃“无数据”/“Z”值。

NULL 很特殊，因为数据库在查询中处理它的方式。例如，采用这两个简单的查询：

select * from mytable where name = 'bob';
select * from mytable where name != 'bob';

如果 name 曾经为 NULL，它显然不会出现在第一个查询的结果中。更重要的是，它也不会出现在第二次查询结果中。除了显式搜索 NULL 之外，NULL 不匹配任何内容，如下所示：

select * from mytable where name is NULL;

当数据可以将 Z 作为有效值时会发生什么？假设您要存储某人的中间名首字母？扎卡里·Z·宗卡斯 (Zachary Z Zonkas) 会被归为那些没有中间名首字母的人吗？或者你的承包商会想出另一个神奇的值来处理这个问题吗？

避免需要您在数据库已经完全能够处理的代码中实现数据库功能的神奇值。这是一个已解决且易于理解的问题，可能只是您的承包商从未真正理解 NULL 的概念，因此避免使用它。

This is easily one of the weirdest opinions I've ever heard. Using a magic value to represent "no data" rather than NULL means that every piece of code that you have will have to post-process the results to account/discard the "no-data"/"Z" values.

NULL is special because of the way that the database handles it in queries. For instance, take these two simple queries:

select * from mytable where name = 'bob';
select * from mytable where name != 'bob';

If name is ever NULL, it obviously won't show up in the first query's results. More importantly, neither will it show up in the second queries results. NULL doesn't match anything other than an explicit search for NULL, as in:

select * from mytable where name is NULL;

And what happens when the data could have Z as a valid value? Let's say you're storing someone's middle initial? Would Zachary Z Zonkas be lumped in with those people with no middle initial? Or would your contractor come up with yet another magic value to handle this?

Avoid magic values that require you to implement database features in code that the database is already fully capable of handling. This is a solved and well understood problem, and it may just be that your contractor never really grokked the notion of NULL and therefore avoids using it.

回复收藏 0 原文

霊感 2024-11-26 04:26:23

如果域允许缺失值，那么使用 NULL 来表示“未定义”是完全可以的（这就是它的用途）。唯一的缺点是必须编写使用数据的代码来检查 NULL。我一直都是这样做的。

我从未听说过（或在实践中见过）使用“Z”来表示丢失的数据。至于“承包商将此视为DBA的‘标准做法’”，他能否提供一些证据来证明这一说法？正如 @Dems 提到的，您还需要记录“Z”并不意味着“Z”：MiddleInitial 列怎么样？

就像 Aaron Alton 和其他许多人一样，我相信 NULL 值是数据库设计的一个组成部分，应该在适当的地方使用。

回复收藏 0 原文

惜醉颜 2024-11-26 04:26:23

即使你以某种方式设法向所有当前和未来的开发人员和 DBA 解释“Z”而不是 NULL，即使他们完美地编码了所有内容，你仍然会让优化器感到困惑，因为它不会知道你已经编好了这个。

使用特殊值来表示 NULL（已经是表示 NULL 的特殊值了）会导致数据出现倾斜。例如，1900 年 1 月 1 日发生了如此多的事情，以至于优化器无法理解与您的应用程序真正相关的实际日期范围。

这就像经理决定：“打领带不利于生产力，所以我们都要在脖子上贴胶带。问题解决了。”

回复收藏 0 原文

如果没有 2024-11-26 04:26:23

我从未听说过广泛使用 'Z' 作为 NULL 的替代品。

（顺便说一句，我并不是特别喜欢与当面告诉您他们和其他“高级”DBA 比您知识渊博、能力更强的承包商合作。）

 +=================================+
 |  FavoriteLetters                |
 +=================================+
 |  Person      |  FavoriteLetter  |
 +--------------+------------------+
 |  'Anna'      |  'A'             |
 |  'Bob'       |  'B'             |
 |  'Claire'    |  'C'             |
 |  'Zaphod'    |  'Z'             |
 +---------------------------------+

您的承包商将如何解释最后一行的数据？

也许他会在此表中选择不同的“魔法值”以避免与真实数据'Z'发生冲突？这意味着您必须记住几个魔术值，以及哪个值在哪里使用...这比只有一个魔术标记 NULL 更好，并且必须记住三值逻辑规则（和陷阱）随之而来？ NULL 至少是标准化的，与承包商的 'Z' 不同。

我也不是特别喜欢 NULL，但是漫不经心地用实际值（或更糟糕的是，用几个实际值）替换它到处几乎肯定比 NULL 更糟糕。

让我在这里重复我的上述评论，以获得更好的可见性：如果您想阅读反对 NULL 的人所写的严肃且有根据的文章，我会推荐这篇短文 “如何在不使用 NULL 的情况下处理缺失信息” （从第三宣言主页链接到PDF）。

I've never heard about the wide-spread use of 'Z' as a substitute for NULL.

(Incidentally, I'd not particularly like to work with a contractor who tells you in the face that they and other "advanced" DBAs are so much more knowledgeable and better than you.)

 +=================================+
 |  FavoriteLetters                |
 +=================================+
 |  Person      |  FavoriteLetter  |
 +--------------+------------------+
 |  'Anna'      |  'A'             |
 |  'Bob'       |  'B'             |
 |  'Claire'    |  'C'             |
 |  'Zaphod'    |  'Z'             |
 +---------------------------------+

How would your contractor interpret the data from the last row?

Probably he would choose a different "magic value" in this table to avoid collision with the real data 'Z'? Meaning you'd have to remember several magic values and also which one is used where... how is this better than having just one magic token NULL, and having to remember the three-valued logic rules (and pitfalls) that go with it? NULL at least is standardized, unlike your contractor's 'Z'.

I don't particularly like NULL either, but mindlessly substituting it with an actual value (or worse, with several actual values) everywhere is almost definitely worse than NULL.

Let me repeat my above comment here for better visibility: If you want to read something serious and well-grounded by people who are against NULL, I would recommend the short article "How to handle missing information without using NULLs" (links to a PDF from The Third Manifesto homepage).

回复收藏 0 原文