何时在 MySQL 表中使用 NULL
我很欣赏数据库表中 NULL 值的语义,它不同于 false 和空字符串 ''。 然而,我经常读到有关字段可为空时的性能问题的信息,并建议在 NULL 实际上语义正确的情况下使用空字符串。
什么情况适合使用可为空字段和 NULL 值? 有哪些权衡? 完全避免使用 NULL 并简单地使用空字符串、 false 或 0 来指示值不存在是否明智?
更新
好的 - 我理解 '' 和 NULL 之间的语义差异以及 NULL 是适当字段值的(与性能无关的)情况。 不过,让我详细阐述一下暗示的性能问题。 这是来自 Schwartz、Zeitsev 等人的优秀《高性能 MySQL》 http://www. borders.co.uk/book/high-performance-mysql-optimization-backups-replication-and-more/857673/:
MySQL 更难优化 引用可为空的列的查询, 因为他们制作索引,索引 统计和价值比较更多 复杂的。 可空列使用 更多的存储空间和要求 MySQL内部的特殊处理。 什么时候 可为空列建立索引,它 每个条目需要一个额外的字节并且 甚至会导致固定大小的索引 (例如单个整数的索引 列)转换为 MyISAM 中的可变大小。
更多这里: Google 图书预览
这很可能是最终的答案 - 我只是在寻找来自一线的第二意见和经验。
I appreciate the semantic meaning of a NULL value in a database table, different from both false and the empty string ''. However, I have often read about performance problems when fields are nullable and been advised to use an empty string in cases where NULL is actually semantically correct.
What circumstances are appropriate to use nullable fields and NULL values? What are the trade-offs? Is it sensible to simply avoid using NULLs altogether and simply use empty strings, false or 0 to indicate the absence of a value?
UPDATE
OK - I understand the semantic difference between '' and NULL as well as the (performance-agnostic) circumstances in which NULL is the appropriate field value. However, let me expand on the hinted performance issue. This is from the excellent "High Performance MySQL" by Schwartz, Zeitsev et al
http://www.borders.co.uk/book/high-performance-mysql-optimization-backups-replication-and-more/857673/:
It's harder for MySQL to optimize
queries that refer to nullable coumns,
because they make indexes, index
statistics, and value comparisons more
complicated. A nullable column uses
more storage space and requires
special processing inside MySQL. When
a nullable column is indexed, it
requires an extra byte per entry and
can even cause a fixed-size inded
(such as an index on a single integer
column) to be converted to a
variable-sized one in MyISAM.
More here:
Google books preview
This is quite possibly the definitive answer - I was just looking for second opinions and experience from the front-line.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
我将对单词的选择挑剔一会儿:
无论如何,我还没有听说过任何证据表明 NULL 表现不佳。 我对任何显示可空列的性能比不可空列差的性能测量的参考感兴趣。
我并不是说我没有错,或者在某些情况下这不可能是真的——只是说做出无聊的假设是没有意义的。 科学不是由猜想组成的,而是由猜想组成的。 必须通过可重复的测量来提供证据。
指标还可以告诉您性能差异有多大,以便您可以判断是否值得担心。 也就是说,影响可能是可衡量的且非零,但与更大的性能因素(例如正确索引表或调整数据库缓存大小)相比仍然微不足道。
在 MySQL 中,搜索 NULL 可以从索引中受益:
请注意,这仍然不是性能的衡量标准。 我仅表明您可以在搜索 NULL 时使用索引。 我要断言(诚然没有测量,但这只是 StackOverflow)索引的好处掩盖了搜索 NULL 与空白字符串时任何可能的惩罚。
选择零或空白或任何其他值来代替 NULL 不是正确的设计决策。 您可能需要在列中使用这些重要值。 这就是 NULL 存在的原因,根据定义,它作为任何数据类型值域之外的值,因此您可以使用整数或字符串或其他任何值的完整范围,并且仍然有一些东西可以表示“以上值都不是”。 ”
I'm going to be nit-picky about word choice for a moment:
In any case, I have not heard of any evidence that NULL performs poorly. I'd be interested in any references to performance measurements that show nullable columns perform worse than non-nullable columns.
I'm not saying I'm not wrong or that it can't be true in some cases -- just that it's not meaningful to make idle suppositions. Science is not made up of conjecture; one has to show evidence with repeatable measurements.
Metrics also tell you by how much the performance differs, so you can make a judgment about whether it's something to worth worrying about. That is, the impact could be measurable and nonzero, but still insignificant compared to greater performance factors, such as properly indexing tables or sizing your database cache.
In MySQL, searches for NULL can benefit from an index:
Note that's still not a measurement of performance. I've only shown that you can use an index while searching for NULL. I'm going to assert (admittedly without having measured, but hey this is just StackOverflow) that the benefit of an index overshadows any possible penalty when searching for NULL versus a blank string.
It's not a correct design decision to choose zero or blank or any other value to substitute for NULL. You may need to use those values as significant in the column. That's why NULL exists, as a value that is by definition outside the domain of values of any data type, so you can use the full range of values of integers or strings or whatever and still have something to signify "none of the above values."
MySQL手册实际上有一篇关于问题的好文章NULL。
希望有帮助。
还发现了另一篇关于 NULL 和性能的帖子
The MySQL manual actually has a nice article about the problems with NULL.
Hope it helps.
Also found this other SO post about NULL and Performance
我们不允许数据库中存在 NULL 值,除非它用于数值或日期。 我们这样做的原因是因为数值有时不应该默认为零,因为这是非常非常糟糕的。 我是一名股票经纪人的开发人员,NULL 和 0 之间有很大的区别。 如果我们确实想要将默认值恢复为零,即使我们不这样存储它们,那么 COALESCE 的使用会很方便。
当我们从平面文件中批量插入数据时,我们使用格式文件来确定数据的条目,无论如何,数据都会自动将空值转换为空白字符串。
日期默认为任何可能出现的值,具体取决于我认为的排序规则,但我们的默认值为 1900 之类的值,同样,日期非常重要。 其他纯文本值并不那么重要,如果留空通常也可以。
We don't allow NULL values in our databases unless it's for numeric values, or for dates. The reason why we do this is because numeric values sometimes should not be defaulted to zero as this is very, very bad. I'm a developer for a stock brokers and there's a big, big difference between NULL and 0. The use of COALESCE comes in handy if we do want to default values back to zero even though we don't store them as such.
As we do bulk inserts of data from flat files we use format files to determine the entry of the data which automagically converts empty values into blank strings anyway.
Dates default to whatever value may appear dependant on the collation I believe, but ours default to something like 1900, and again, dates are extremely important. Other plain text values aren't so important, and if left blank typically qualify as okay.
通常,如果某个属性是必需的,则将其定义为 Not NULL,如果可以省略,则将其定义为可为空。
Generally, if an attribute is required, it is defined as Not NULL and if it may be omitted it is defined as nullable.
正如@ForYourOwnGood 所说 - Null 应该用于“未知”信息。 例如:如果您有很多字段需要客户在注册时填写,其中一些字段是可选的。 由于某种原因,您可能想为该特定客户保留一个 ID,并且由于您不知道可选字段是否是客户真正选择留空的,因此您应该在第一次保存时将它们设置为 NULL,即“未知”行。 如果客户提交表单,通过所有验证,然后保存信息,那么您实际上就知道可选字段有意留空。
这只是使用 NULL 的一个很好的例子。
As @ForYourOwnGood said - Null should be used for "unknown" information. For example: If you have a lot of fields that customer should fill-in on registration and some of them are optional. For some reason you might want to reserve an ID for that particular customer and since you don't know if the optional fields are a real choice by the customer to be left empty you should set them as NULL i.e. "unknown" when you first save the row. If the customer submits the form, passes all your validation and then you save the information, then you know for fact that the optional field is left empty by intention.
This is just a good case of using NULL.
不应使用空字符串来代替
NULL
。NULL
代表什么都没有,因为空字符串代表什么,里面什么也没有。 与另一个值(甚至NULL
)相比,NULL
始终为 false,并且NULL
不会在COUNT
功能。如果您需要表示未知信息,则
NULL
是无可替代的。The empty string should not be used in place of
NULL
.NULL
represents nothing where as the empty string is something, with nothing inside.NULL
will always be false when compared to another value (evenNULL
) andNULL
will not be summed in theCOUNT
function.If you need to represent unknown information there is no substitute to
NULL
.NULL 列的含义或多或少是“不适用于此上下文”。 我通常在两种情况下使用 NULL 列:
lated_at
和is_close
)它已经关闭了。基本上,它可以归结为这样一个事实:当字段的空性与空字段具有不同的唯一语义时,我使用 NULL。 中间名首字母的缺失就是这样。 没有截止日期意味着库存集仍可能发生变化。
NULL 值可能会产生令人讨厌的副作用,它们会使您向表中添加数据变得更加困难,并且通常情况下,您最终可能会得到 NULL 值和空字符串的混合体。
另外,NULL 不等于任何东西,如果你不小心的话,查询就会乱七八糟。
就我个人而言,我仅在上述两种情况之一适用时才使用 NULL 列。 当空除了没有值之外没有任何意义时,我从不使用它来表示空字段。
The meaning of a NULL column is more or less "doesn't apply in this context". I generally use NULL columns in two cases:
closed_at
andis_closed
), I just create the closed_at column and set it to NULL if the inventory set can still be changed, but set the date once it's closed.Basically it boils down to the fact that I use NULL when the emptyness of a field has a different unique semantic than just an empty field. The absence of a middle initial is just that. The absence of a closing date has the meaning of the inventory set still being open to changes.
NULL values can have nasty side effects and they will make life harder for you to add data to the table and more often than not, you can end up with a mish-mash of NULL values and empty strings for example.
Also, NULL is not equal to anything, which will screw queries all over the place if you are not very careful.
Personally, I use NULL columns only when one of the above two cases applies. I never use it to signify empty fields when the emptyness has no meaning other than the absence of a value.
我知道有时 MySQL 的 NULL 语义是完全合适的。
也就是说,它们确实造成了严重的阻碍,尤其是在文本字段方面。
这是一个现实世界的例子。
我们希望将数据从 FileMaker 数据库复制到 mysql 表中。
如果我们执行“SELECT * from table where textfield <> 'test'”,则不会返回文本字段为 NULL 的行。 这很可能不是您所期望或想要的。
如果在 where 查询中单独或作为 AND 的一部分使用可为空的字段,则永远不会返回 NULL 条目,除非使用 IS NULL 测试。 我们必须做类似“where ((textfield<>"test") OR (textfield IS NOT NULL))”的事情,这充其量是丑陋的。
因此在这种情况下,我们可能不希望该字段可为空。
这里的问题是您无法使用 Filemaker 将空字符串插入 MySQL。 它会转换为 NULL,如果您将列设置为非空,则会出错! 如果您确实允许 NULL,那么到 mysql 的传输可以工作,但是您的非查询无法按照您想要的方式运行!
解决方法是更改表两次,将导入后现有的空值转换为空字符串,然后将表更改回再次允许空值。 你呀!
该死的文件制作者。
I understand that there are times when MySQL's NULL semantics are entirely appropriate.
That said, they do seriously get in the way, particularly with text fields.
Here's a real world example.
We wish to copy data from a FileMaker database into a mysql table.
if we do "SELECT * from table where textfield <> 'test'", rows that have textfield of NULL will NOT be returned. This is most likely not what you expected or desired.
if a field that is nullable is used in a where query, either alone or as part of an AND, entries that are NULL will NEVER be returned unless the IS NULL test used. We must do something like "where ((textfield<>"test") OR (textfield IS NOT NULL))" which is ugly at best.
So in this case, we probably do not want the field nullable.
The problem here is that you CANNOT insert an empty string into MySQL using Filemaker. It gets converted to NULL, which errors out if you made the column not null! If you do allow NULL, then the transfer to mysql works, but then your not queries fail to act the way you want!
the workaroud is to alter table twice, to convert the existing nulls after the import to empty string, then alter the table back to once again allow null. youch!
damn filemaker.
当然,主要的好处是您提到的 NULL 的语义。
除此之外,它可能取决于您的存储引擎,一如既往,请检查文档,但至少在某些数据库中,NULL 占用的空间比常规值少得多。 例如,如果您有一个声明为 20 个字符的“varchar”列,并且很少填充它,则可以通过将其设置为 NULL 而不是空字符串来节省大量磁盘空间。
我从未听说过使用 NULL 会带来任何性能问题,恰恰相反。 我听说有人因为对 NULL 的计数错误而搞砸了计数,但从来没有听说过性能。 如果这是真的,我很想听听!
The main benefit, of course, is the semantic meaning of NULL, which you mentioned.
In addition to that -- and it may depend on your storage engine, as always, check the documentation -- but in at least some databases, NULLs take up a lot less room than a regular value. For example, if you have a "varchar" column declared to be 20 characters, and it's rarely filled in, you can save a lot of disk space by making it NULL instead of an empty string.
I have never heard of any performance issues with using NULLs, one the opposite. I've heard of people mucking up their counts because they counted NULLs wrong, but never performance. If that's a real thing, I would love to hear about it!
如今,任何有自尊心的数据库引擎都不应该因正确使用 NULL 而受到任何惩罚,除非您的查询设计不正确(这通常不是您在 NULL 方面经常遇到的问题)。
您首先应该注意按预期使用数据库(包括 NULL); 然后担心优化后果何时发生以及如果发生的话。
不正确的 NULL 列值对 SQL 复杂性和准确性的累积影响几乎肯定会超过欺骗母 DBMS 的好处。 此外,它会扰乱你的头脑,以及后来试图弄清楚你想要做什么的人的头脑。
Any self-respecting database engine these days should offer no penalty for properly using NULLs, unless your query is not designed correctly (which is usually not a problem you'll have very often with regard to NULLs).
You should pay first attention to using the database (including NULLs) as intended; then worry about the optimizatin consequences when and if they occur.
The cumulative effect of improperly NULLed column values in both SQL complexity and accuracy will almost surely outweigh the benefits of fooling with Mother DBMS. Besides, it will mess up your head, as well as that of anyone later who tries to figure out what you were trying to do.
在某些数据库(例如 Oracle)上,MySQL 上的某些想法可能是正确的:
On some databases like Oracle, may be somethinkg on MySQL is true: