SQL Server:为什么比较 null=value 对于 NOT IN 返回 true?

发布于 2024-09-27 06:47:10 字数 2320 浏览 1 评论 0原文

为什么 valuenull 的比较会返回 false,除非使用 NOT IN 时返回 true?


给定一个查询来查找所有拥有帖子的 stackoverflow 用户:

SELECT * FROM Users
WHERE UserID IN (SELECT UserID FROM Posts)

这按预期工作;我得到了所有拥有帖子的用户的列表。

现在查询逆;查找所有没有发布帖子的 stackoverflow 用户:

SELECT * FROM Users
WHERE UserID NOT IN (SELECT UserID FROM Posts)

这不会返回任何记录,这是不正确的。

给定假设数据1

Users              Posts
================   ===============================
UserID  Username   PostID   UserID  Subject
------  --------   -------  ------  ----------------
1       atkins     1        1       Welcome to stack ov...
2       joels      2        2       Welcome all!
...     ...        ...      ...
399573  gt6989b    ...      ...
...     ...        ...      ...
                   10592    null    (deleted by nsl&fbi...
                   ...      ... 

并假设 NULL 规则:

  • NULL = NULL 计算结果为未知
  • NULL <> NULL 计算结果为未知
  • value = NULL 计算结果为未知

如果我们查看第二个查询,我们感兴趣的是查找 Users.UserID 的所有行在 Posts.UserID 列中找到。我将按如下逻辑进行:

检查 UserID 1

  • 1 = 1 返回 true。因此我们得出结论,该用户有一些帖子,并且不将它们包含在输出列表中

现在检查 UserID 2:

  • 2 = 1 返回 false,因此我们继续查找
  • 2 = 2 返回 true,因此我们得出结论该用户有一些帖子,并且不将其包含在输出列表中

现在检查 UserID 399573

  • 399573 = 1 返回 false,因此我们继续查找
  • 399573 = 2 返回 false,所以我们继续寻找
  • ...
  • 399573 = null 返回未知,所以我们继续寻找
  • ...

我们没有找到 UserID 399573 的帖子,因此我们将把他包含在输出中列表。

但 SQL Server 不这样做。如果您的 in 列表中有 NULL,那么它会突然找到匹配项。 它突然找到了一个匹配项。突然 399573 = null 的计算结果为 true。

为什么valuenull 的比较返回unknown,除非它返回true?

编辑:我知道我可以通过专门排除空值来解决这种无意义行为:

SELECT * FROM Users
WHERE UserID NOT IN (
   SELECT UserID FROM Posts
   WHERE UserID IS NOT NULL)

但我不应该这样做,据我所知,布尔逻辑应该是没有它就很好 - 因此我的问题。

脚注

  • 1假设数据;如果你不喜欢它:弥补你的不足。
  • celko 现在有了自己的标签

Why does the comparison of value to null return false, except when using a NOT IN, where it returns true?


Given a query to find all stackoverflow users who have a post:

SELECT * FROM Users
WHERE UserID IN (SELECT UserID FROM Posts)

This works as expected; i get a list of all users who have a post.

Now query for the inverse; find all stackoverflow users who don't have a post:

SELECT * FROM Users
WHERE UserID NOT IN (SELECT UserID FROM Posts)

This returns no records, which is incorrect.

Given hypothetical data1

Users              Posts
================   ===============================
UserID  Username   PostID   UserID  Subject
------  --------   -------  ------  ----------------
1       atkins     1        1       Welcome to stack ov...
2       joels      2        2       Welcome all!
...     ...        ...      ...
399573  gt6989b    ...      ...
...     ...        ...      ...
                   10592    null    (deleted by nsl&fbi...
                   ...      ... 

And assume the rules of NULLs:

  • NULL = NULL evaluates to unknown
  • NULL <> NULL evaluates to unknown
  • value = NULL evaluates unknown

If we look at the 2nd query, we're interested in finding all rows where the Users.UserID is not found in the Posts.UserID column. i would proceed logically as follows:

Check UserID 1

  • 1 = 1 returns true. So we conclude that this user has some posts, and do not include them in the output list

Now check UserID 2:

  • 2 = 1 returns false, so we keep looking
  • 2 = 2 returns true, so we conclude that this user has some posts, and do not include them in the output list

Now check UserID 399573

  • 399573 = 1 returns false, so we keep looking
  • 399573 = 2 returns false, so we keep looking
  • ...
  • 399573 = null returns unknown, so we keep looking
  • ...

We found no posts by UserID 399573, so we would include him in the output list.

Except SQL Server doesn't do this. If you have a NULL in your in list, then suddenly it finds a match. It suddenly finds a match. Suddenly 399573 = null evaluates to true.

Why does the comparison of value to null return unknown, except when it returns true?

Edit: i know that i can workaround this nonsensical behavior by specifically excluding the nulls:

SELECT * FROM Users
WHERE UserID NOT IN (
   SELECT UserID FROM Posts
   WHERE UserID IS NOT NULL)

But i shouldn't have to, as far as i can tell the boolean logic should be fine without it - hence my question.

Footnotes

  • 1 hypothetical data; if you don't like it: make up your down.
  • celko now has his own tag

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

十秒萌定你 2024-10-04 06:47:10

常见问题,预设答案:

NOT IN 子句的行为可能会令人困惑,因此需要一些解释。考虑以下查询:

SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)

尽管 AdventureWorks.Person.Contact 中有一千多个不同的姓氏,但该查询不返回任何内容。对于初学者数据库程序员来说,这可能看起来违反直觉,但实际上是完全有道理的。该解释由几个简单的步骤组成。首先,考虑以下两个查询,它们显然是等效的:

SELECT LastName, FirstName FROM Person.Contact

WHERE LastName IN('Hedlund', 'Holloway', NULL)



SELECT LastName, FirstName FROM Person.Contact

WHERE LastName='Hedlund' OR LastName='Holloway' OR LastName=NULL

请注意,两个查询都返回预期结果。现在,让我们回顾一下德摩根定理,该定理指出:

not (P and Q) = (not P) or (not Q)

not (P or Q) = (not P) and (not Q)

我正在从维基百科(http://en.wikipedia.org/wiki/De_Morgan_duality)剪切和粘贴。将 DeMorgan 定理应用于此查询,可以得出这两个查询也是等价的:

SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)



SELECT LastName, FirstName FROM Person.Contact

WHERE LastName<>'Hedlund' AND LastName<>'Holloway' AND LastName<>NULL

最后一个 LastName<>NULL 永远不可能为 true

Common problem, canned answer:

The behavior of NOT IN clause may be confusing and as such it needs some explanations. Consider the following query:

SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)

Although there are more than a thousand distinct last names in AdventureWorks.Person.Contact, the query returns nothing. This may look counterintuitive to a beginner database programmer, but it actually makes perfect sense. The explanation consist of several simple steps. First of all, consider the following two queries, which are clearly equivalent:

SELECT LastName, FirstName FROM Person.Contact

WHERE LastName IN('Hedlund', 'Holloway', NULL)



SELECT LastName, FirstName FROM Person.Contact

WHERE LastName='Hedlund' OR LastName='Holloway' OR LastName=NULL

Note that both queries return expected results. Now, let us recall DeMorgan's theorem, which states that:

not (P and Q) = (not P) or (not Q)

not (P or Q) = (not P) and (not Q)

I am cutting and pasting from Wikipedia (http://en.wikipedia.org/wiki/De_Morgan_duality). Applying DeMorgan's theorem to this queries, it follows that these two queries are also equivalent:

SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)



SELECT LastName, FirstName FROM Person.Contact

WHERE LastName<>'Hedlund' AND LastName<>'Holloway' AND LastName<>NULL

This last LastName<>NULL can never be true

飞烟轻若梦 2024-10-04 06:47:10

你第一句话中的假设是不正确的:

为什么比较值
null 返回 false,除非使用
NOT IN,哪里返回true?

但是将值与 null 进行比较不会返回 false;它返回未知unknown 有其自己的逻辑:

unknown  AND  true   = unknown
unknown  OR   true   = true
unknown  OR   false  = unknown

这是如何实现的一个示例:

where 1 not in (2, null)
--> where 1 <> 2 and 1 <> null
--> where true and unknown
--> where unknown

where 子句仅匹配 true,因此这会过滤掉任何行。

您可以在 Wikipedia 中找到 3 值逻辑的全部优点。

The assumption in your first sentence isn't right:

Why does the comparison of value to
null return false, except when using a
NOT IN, where it returns true?

But comparison of a value to null does not return false; it returns unknown. And unknown has its own logic:

unknown  AND  true   = unknown
unknown  OR   true   = true
unknown  OR   false  = unknown

One example of how this works out:

where 1 not in (2, null)
--> where 1 <> 2 and 1 <> null
--> where true and unknown
--> where unknown

The where clause only matches on true, so this filters out any row.

You can find the full glory of 3 value logic at Wikipedia.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文