可空数据类型与非空 varchar 数据类型 - 哪个查询速度更快?
我们通常更喜欢让所有 varchar
/nvarchar
列不可为空并使用空字符串 ('') 作为默认值。团队中有人建议 nullable 更好,因为:
像这样的查询:
Select * From MyTable Where MyColumn IS NOT NULL
比这样更快:
Select * From MyTable Where MyColumn == ''
有人有任何经验来验证这是否属实吗?
We generally prefer to have all our varchar
/nvarchar
columns non-nullable with a empty string ('') as a default value. Someone on the team suggested that nullable is better because:
A query like this:
Select * From MyTable Where MyColumn IS NOT NULL
is faster than this:
Select * From MyTable Where MyColumn == ''
Anyone have any experience to validate whether this is true?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在某些平台(甚至版本)上,这将取决于 NULL 的索引方式。
我对 NULL 的基本经验法则是:
在合理之前不允许使用 NULL
除非数据确实未知,否则不允许使用 NULL
一个很好的例子是对地址线进行建模。如果有AddressLine1和AddressLine2,第一个有数据而第二个为NULL意味着什么?在我看来,你要么知道地址,要么不知道,当有人将它们连接起来并得到 NULL(ANSI 行为)时,一组数据中包含部分 NULL 只会带来麻烦。您可以通过允许 NULL 并添加检查约束来解决此问题 - 所有地址信息都为 NULL 或没有。
与中间名首字母/名字类似。有些人没有。这与未知有什么不同吗?你关心吗?
另外,死亡日期 - NULL 是什么意思?没死?死亡日期未知?很多时候,单列不足以编码某个领域的知识。
因此,对我来说,是否允许 NULL 在很大程度上取决于数据的语义,首先是性能,其次是性能,因为数据被误解(可能被许多不同的人误解)通常会带来更大的损失。比性能更昂贵的问题。
这看起来可能是一件小事(在 SQL Server 中,实现是与行一起存储的位掩码),但在我看来,在合理化之后只允许 NULL 效果最好。它在开发早期就捕获了问题,迫使您解决假设并理解您的问题领域。
On some platforms (and even versions), this is going to depend on how NULLs are indexed.
My basic rule of thumb for NULLs is:
Don't allow NULLs until justified
Don't allow NULLs unless the data can really be unknown
A good example of this is modeling address lines. If you have an AddressLine1 and AddressLine2, what does it mean for the first to have data and the second to be NULL? It seems to me, you either know the address or not, and having partial NULLs in a set of data just asks for trouble when somebody concatenates them and gets NULL (ANSI behavior). You might solve this with allowing NULLs and adding a check constraint - either all the Address information is NULL or none is.
Similar thing with middle initial/name. Some people don't have one. Is this different from it being unknown and do you care?
ALso, date of death - what does NULL mean? Not dead? Unknown date of death? Many times a single column is not sufficient to encode knowledge in a domain.
So to me, whether to allow NULLs would depend very much on the semantics of the data first - performance is going to be second, because having data misinterpreted (potentially by many different people) is usually a far more expensive problem than performance.
It might seem like a little thing (in SQL Server the implementation is a bitmask stored with the row), but only allowing NULLs after justification seems to me to work best. It catches things early in development, forces you to address assumptions and understand your problem domain.
如果你想知道没有值,请使用NULL。
至于速度,IS NULL 应该更快,因为它不使用字符串比较。
If you want to know that there is no value, use NULL.
As for speed, IS NULL should be faster, because it doesn't use string comparison.
如果需要 NULL,请使用 NULL。同上空字符串。
至于性能,“这取决于”
如果您有 varchar,则您将在行中存储长度的实际值。如果您有字符,则存储实际长度。 NULL 不会存储在行中,具体取决于引擎(例如 SQL Server 的 NULL 位图)。
这意味着 IS NULL 查询查询速度更快,但它可能会增加 COALESCE/NULLIF/ISNULL 复杂性。
所以,你的同事是部分正确的,但可能没有完全理解。
盲目使用空字符串是使用哨兵值,而不是解决 NULL 语义问题
FWIW 和个人:
我倾向于使用 NULL 但并不总是。我喜欢避免像 9999 年 12 月 31 日这样的日期,这就是避免 NULL 的原因。
从 Cade Roux 的回答中...我还发现有关“死亡日期是否可以为空”的讨论毫无意义。对于一个字段,实际上,要么有值,要么没有值。
哨兵值比 NULL 更糟糕。神奇的数字。有人吗?
If you need NULL, use NULL. Ditto empty string.
As for performance, "it depends"
If you have varchar, you are storing an actual value in the row for the length. If you have char, then you store the actual length. NULL won't be stored in-row depending on the engine (NULL bitmap for SQL Server for example).
This means IS NULL is quicker, query for query, but it could add COALESCE/NULLIF/ISNULL complexity.
So, your colleague is partially correct but may not appreciate it fully.
Blindly using empty string is use of a sentinel value rather then working through the NULL semantic issue
FWIW and personally:
I would tend to use NULL but don't always. I like to avoid dates like 31 Dec 9999 which is where NULL avoidance leads you.
From Cade Roux's answer... I also find that discussions about "Is date of death nullable" pointless. For an field, in practical terms, either there is a value or there isn't.
Sentinel values are worse then NULLs. Magic numbers. anyone?
告诉你团队里的那个家伙,别再过早地优化头脑了! (但以一种很好的方式)。
像这样的开发人员可能会毒害团队,充满低级优化神话,所有这些都可能是真实的,或者对于某些特定供应商或查询模式在某个时间点是真实的,或者可能仅在理论上真实,但从未真实在实践中。根据这些神话采取行动是一种代价高昂的时间浪费,并且可能会破坏原本良好的设计。
他可能是出于好意,想为团队贡献自己的知识。不幸的是,他错了。就基准是否能证明他的陈述正确或错误而言,这并没有错。他错了,因为这不是设计数据库的方式。是否使字段可以为 NULL 的问题是一个关于数据域的问题,目的是为了定义字段的类型。应该根据该字段没有价值意味着什么来回答这个问题。
Tell that guy on your team to get his prematurely optimizin' head out of his ass! (But in a nice way).
Developers like that can be poison to the team, full of low-level optimization myths, all of which may be true or have been true at one point in time for some specific vendor or query pattern, or possibly only true in theory but never true in practice. Acting upon these myths is a costly waste of time, and can destroy an otherwise good design.
He probably means well and wants to contribute his knowledge to the team. Unfortunately, he is wrong. Not wrong in the sense of whether a benchmark will prove his statement correct or incorrect. He's wrong in the sense that this is not how you design a database. The question of whether to make a field NULL-able is a question about domain of the data for the purposes of defining the type of the field. It should be answered in terms of what it means for the field to have no value.
简而言之,NULL = UNKNOWN!...这意味着(使用死亡日期示例)实体可能 1)活着,2)死亡但死亡日期未知,或 3)未知实体是否死亡或活着。对于数字列,我总是将它们默认为 0 (ZERO),因为在此过程中的某个地方,您可能必须执行聚合计算并且 NULL + 123 = NULL。对于字母数字,我使用 NULL,因为它的性能成本最低,并且更容易说“...where a IS NULL”而不是说“...where a = "”'。使用 '...where a = " "[space]' 不是一个好主意,因为 [space] 不是 NULL!对于日期,如果必须将日期列保留为 NULL,则可能需要添加状态指示符列,在上面的示例中,A=Alive、D=Dead、Q=Dead、死亡日期未知、N=Alive或死亡未知。
In a nutshell, NULL = UNKNOWN!.. Which means (using date of death example) that the entity could be 1)alive, 2)dead but date of death is not known, or 3)unknown if entity is dead or alive. For numeric columns I always default them to 0 (ZERO) because somewhere along the line you may have to perform aggregate calculations and NULL + 123 = NULL. For alphanumerics I use NULL since its least expensive performance-wise and easier to say '...where a IS NULL' than saying '...where a = "" '. Using '...where a = " "[space]' is not a good idea because [space] is not a NULL! For dates, if you have to leave a date column NULL, you may want to add a status indicator column which, in the above example, A=Alive, D=Dead, Q=Dead, date of death not known, N=Alive or Dead is unknown.