在连接条件上使用 IS NULL 或 IS NOT NULL - 理论问题
这里的理论问题:
为什么指定 table.field IS NULL 或 table.field IS NOT NULL 在连接条件(例如左或右连接)上不起作用,而仅在 where 条件下起作用?
无效示例:
-这应该返回所有货件,并过滤掉任何退货(非空值)。但是,这会返回所有发货,无论是否有任何内容满足 [r.id 为 null] 语句。
SELECT
*
FROM
shipments s
LEFT OUTER JOIN returns r
ON s.id = r.id
AND r.id is null
WHERE
s.day >= CURDATE() - INTERVAL 10 DAY
工作示例:
- 这将返回正确的行数,即总发货量,减去与退货相关的任何行数(非空值)。
SELECT
*
FROM
shipments s
LEFT OUTER JOIN returns r
ON s.id = r.id
WHERE
s.day >= CURDATE() - INTERVAL 10 DAY
AND r.id is null
为什么会这样呢?连接的两个表之间的所有其他过滤条件都可以正常工作,但由于某种原因,除非在 where 语句中,否则 IS NULL 和 IS NOT NULL 过滤器不起作用。
这是什么原因呢?
Theory question here:
Why does specifying table.field IS NULL or table.field IS NOT NULL not work on a join condition (left or right join for instance) but only in the where condition?
Non working Example:
-this should return all shipments with any returns (non null values) filtered out. However, this returns all shipments regardless if anything meets the [r.id is null] statement.
SELECT
*
FROM
shipments s
LEFT OUTER JOIN returns r
ON s.id = r.id
AND r.id is null
WHERE
s.day >= CURDATE() - INTERVAL 10 DAY
Working example:
-This returns the correct amount of rows which is total shipments, less any related to a returns (non null values).
SELECT
*
FROM
shipments s
LEFT OUTER JOIN returns r
ON s.id = r.id
WHERE
s.day >= CURDATE() - INTERVAL 10 DAY
AND r.id is null
Why is this the case? All other filter conditions between two tables being joined work just fine, but for some reason IS NULL and IS NOT NULL filters do not work unless in the where statement.
What is the reason for this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
表 A 和 B 的示例:
如果您想查找父母及其孩子,您可以执行
INNER JOIN
:结果是
parent
的的每次匹配左表中的 id
和第二个表中的child
的pid
将在结果中显示为一行:现在,上面不显示父母没有孩子(因为他们的 id 与孩子的 id 不匹配) ids,那么你会做什么呢?有三种类型的外连接:左外连接、右外连接和全外连接,因为我们需要左侧的“额外”行。表(父项):
结果是除了之前的匹配项之外,还显示所有没有匹配项的父项(即:没有孩子):
所有这些
NULL
是从哪里来的? MySQL(或您可能使用的任何其他 RDBMS)不会知道由于这些父母没有匹配项(孩子),因此没有pid
或child.name
与这些父母匹配,因此,它放置了这个特殊的非。 - 值称为NULL
。我的观点是,这些
NULL
是在LEFT OUTER JOIN
期间创建的(在结果集中)。因此,如果我们只想显示对于没有孩子的父母,我们可以将
WHERE child.pid IS NULL
添加到上面的LEFT JOIN
中。WHERE
子句在JOIN
完成后进行评估(检查)。因此,从上面的结果可以清楚地看出,只会显示pid
为 NULL 的最后三行:结果:
现在,如果我们移动
IS NULL
检查会发生什么从WHERE
到连接ON
子句?在这种情况下,数据库尝试从两个表中查找与这些条件匹配的行。也就是说,
parent.id = child.pid
ANDchild.pid IN NULL
的行。但它找不到没有这样的匹配,因为child.pid
不能等于某些东西(1、2、3、4或5)并且同时为NULL!因此,条件:
等同于:
始终为
False
。那么,为什么它返回左表中的所有行? 因为它是左连接!左连接返回匹配的行(本例中没有)以及左表中不匹配的行 > 检查(在本例中为所有):
我希望以上解释是清楚的。
旁注(与您的问题没有直接关系):为什么
Pan
没有出现在我们的 JOIN 中?因为他的pid
是NULL
并且在SQL的(不常见的)逻辑中NULL不等于任何东西,所以它不能与任何父id(它们是1、2、3、4 和 5)。即使那里有 NULL,它仍然不匹配,因为 NULL 不等于任何东西,甚至不等于 NULL 本身(这确实是一个非常奇怪的逻辑!)。这就是为什么我们使用特殊检查IS NULL
而不是= NULL
检查。那么,如果我们执行
RIGHT JOIN
,Pan
会出现吗?是的,会的!因为 RIGHT JOIN 将显示所有匹配的结果(我们执行的第一个 INNER JOIN)以及 RIGHT 表中不匹配的所有行(在我们的例子中是一个,(NULL, 'Pan')< :
结果:
不幸的是,MySQL 没有 FULL JOIN ,您可以在其他 RDBMS 中尝试,它会显示
Example with tables A and B:
If you want to find parents and their kids, you do an
INNER JOIN
:Result is that every match of a
parent
'sid
from the left table and achild
'spid
from the second table will show as a row in the result:Now, the above does not show parents without kids (because their ids do not have a match in child's ids, so what do you do? You do an outer join instead. There are three types of outer joins, the left, the right and the full outer join. We need the left one as we want the "extra" rows from the left table (parent):
Result is that besides previous matches, all parents that do not have a match (read: do not have a kid) are shown too:
Where did all those
NULL
come from? Well, MySQL (or any other RDBMS you may use) will not know what to put there as these parents have no match (kid), so there is nopid
norchild.name
to match with those parents. So, it puts this special non-value calledNULL
.My point is that these
NULLs
are created (in the result set) during theLEFT OUTER JOIN
.So, if we want to show only the parents that do NOT have a kid, we can add a
WHERE child.pid IS NULL
to theLEFT JOIN
above. TheWHERE
clause is evaluated (checked) after theJOIN
is done. So, it's clear from the above result that only the last three rows where thepid
is NULL will be shown:Result:
Now, what happens if we move that
IS NULL
check from theWHERE
to the joiningON
clause?In this case the database tries to find rows from the two tables that match these conditions. That is, rows where
parent.id = child.pid
ANDchild.pid IN NULL
. But it can find no such match because nochild.pid
can be equal to something (1, 2, 3, 4 or 5) and be NULL at the same time!So, the condition:
is equivalent to:
which is always
False
.So, why does it return ALL rows from the left table? Because it's a LEFT JOIN! And left joins return rows that match (none in this case) and also rows from the left table that do not match the check (all in this case):
I hope the above explanation is clear.
Sidenote (not directly related to your question): Why on earth doesn't
Pan
show up in none of our JOINs? Because hispid
isNULL
and NULL in the (not common) logic of SQL is not equal to anything so it can't match with any of the parent ids (which are 1,2,3,4 and 5). Even if there was a NULL there, it still wouldn't match becauseNULL
does not equal anything, not evenNULL
itself (it's a very strange logic, indeed!). That's why we use the special checkIS NULL
and not a= NULL
check.So, will
Pan
show up if we do aRIGHT JOIN
? Yes, it will! Because a RIGHT JOIN will show all results that match (the first INNER JOIN we did) plus all rows from the RIGHT table that don't match (which in our case is one, the(NULL, 'Pan')
row.Result:
Unfortunately, MySQL does not have
FULL JOIN
. You can try it in other RDBMSs, and it will show:NULL
部分是在实际连接之后计算的,因此这就是它需要位于 where 子句中的原因。The
NULL
part is calculated AFTER the actual join, so that is why it needs to be in the where clause.实际上 NULL 过滤器并没有被忽略。事情是这样的,这就是连接两个表的工作原理。
我将尝试逐步介绍数据库服务器执行的步骤以使其理解。
例如,当您执行您所说的忽略 NULL 条件的查询时。
选择
*
从
出货量
LEFT OUTER JOIN 返回 r
ON s.id = r.id
并且 r.id 为空
在哪里
s.day >= CURDATE() - INTERVAL 10 DAY
第一件事是在下一步中选择表 SHIPMENTS 中的所有行,
数据库服务器将开始从第二个(RETURNS)表中一一选择记录。
在第三步中,RETURNS 表中的记录将根据您在查询中提供的连接条件进行限定,在本例中为(s.id = r.id 且 r.id 为 NULL),
请注意,此限定仅适用于第三步决定服务器是否应该接受或拒绝 RETURNS 表的当前记录以附加到 SHIPMENT 表的选定行。它绝不会影响从 SHIPMENT 表中选择记录。
一旦服务器完成连接两个包含 SHIPMENT 表的所有行和 RETURNS 表的选定行的表,它就会对中间结果应用 where 子句。
因此,当您将 (r.id is NULL) 条件放入 where 子句时,中间结果中 r.id = null 的所有记录都会被过滤掉。
Actually NULL filter is not being ignored. Thing is this is how joining two tables work.
I will try to walk down with the steps performed by database server to make it understand.
For example when you execute the query which you said is ignoring the NULL condition.
SELECT
*
FROM
shipments s
LEFT OUTER JOIN returns r
ON s.id = r.id
AND r.id is null
WHERE
s.day >= CURDATE() - INTERVAL 10 DAY
1st thing happened is all the rows from table SHIPMENTS get selected
on next step database server will start selecting one by one record from 2nd(RETURNS) table.
on third step the record from RETURNS table will be qualified against the join conditions you have provided in the query which in this case is (s.id = r.id and r.id is NULL)
note that this qualification applied on third step only decides if server should accept or reject the current record of RETURNS table to append with the selected row of SHIPMENT table. It can in no way effect the selection of record from SHIPMENT table.
And once server is done with joining two tables which contains all the rows of SHIPMENT table and selected rows of RETURNS table it applies the where clause on the intermediate result.
so when you put (r.id is NULL) condition in where clause than all the records from the intermediate result with r.id = null gets filtered out.
您正在执行 LEFT OUTTER JOIN ,这表明您想要语句左侧表中的每个元组,无论它在右侧表中是否有匹配的记录。在这种情况下,您的结果将从 RIGHT 表中删除,但最终得到的结果与您在 ON 子句中根本不包含 AND 一样。
在 WHERE 子句中执行 AND 会导致在 LEFT JOIN 发生之后进行修剪。
You're doing a
LEFT OUTTER JOIN
which indicates that you want every tuple from the table on the LEFT of the statement regardless of it has a matching record in the RIGHT table. This being the case, your results are being pruned from the RIGHT table but you're ending up with the same results as if you didn't include the AND at all within the ON clause.Performing the AND in the WHERE clause causes the prune to happen after the LEFT JOIN takes place.
WHERE
子句在处理JOIN
条件后进行计算。The
WHERE
clause is evaluated after theJOIN
conditions have been processed.你的执行计划应该明确这一点; JOIN 优先,然后过滤结果。
Your execution plan should make this clear; the JOIN takes precedence, after which the results are filtered.