在连接条件上使用 IS NULL 或 IS NOT NULL - 理论问题

发布于 2024-11-19 10:27:54 字数 716 浏览 1 评论 0原文

这里的理论问题:

为什么指定 table.field IS NULL 或 table.field IS NOT NULL 在连接条件(例如左或右连接)上不起作用,而仅在 where 条件下起作用?

无效示例:

-这应该返回所有货件,并过滤掉任何退货(非空值)。但是,这会返回所有发货,无论是否有任何内容满足 [r.id 为 null] 语句。

SELECT
  *
FROM 
  shipments s
LEFT OUTER JOIN returns r  
  ON s.id = r.id
  AND r.id is null
WHERE
  s.day >= CURDATE() - INTERVAL 10 DAY 

工作示例:

- 这将返回正确的行数,即总发货量,减去与退货相关的任何行数(非空值)。

SELECT
  *
FROM 
  shipments s
LEFT OUTER JOIN returns r  
  ON s.id = r.id
WHERE
  s.day >= CURDATE() - INTERVAL 10 DAY
  AND r.id is null

为什么会这样呢?连接的两个表之间的所有其他过滤条件都可以正常工作,但由于某种原因,除非在 where 语句中,否则 IS NULL 和 IS NOT NULL 过滤器不起作用。

这是什么原因呢?

Theory question here:

Why does specifying table.field IS NULL or table.field IS NOT NULL not work on a join condition (left or right join for instance) but only in the where condition?

Non working Example:

-this should return all shipments with any returns (non null values) filtered out. However, this returns all shipments regardless if anything meets the [r.id is null] statement.

SELECT
  *
FROM 
  shipments s
LEFT OUTER JOIN returns r  
  ON s.id = r.id
  AND r.id is null
WHERE
  s.day >= CURDATE() - INTERVAL 10 DAY 

Working example:

-This returns the correct amount of rows which is total shipments, less any related to a returns (non null values).

SELECT
  *
FROM 
  shipments s
LEFT OUTER JOIN returns r  
  ON s.id = r.id
WHERE
  s.day >= CURDATE() - INTERVAL 10 DAY
  AND r.id is null

Why is this the case? All other filter conditions between two tables being joined work just fine, but for some reason IS NULL and IS NOT NULL filters do not work unless in the where statement.

What is the reason for this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

表 A 和 B 的示例:

 A (parent)       B (child)    
============    =============
 id | name        pid | name 
------------    -------------
  1 | Alex         1  | Kate
  2 | Bill         1  | Lia
  3 | Cath         3  | Mary
  4 | Dale       NULL | Pan
  5 | Evan  

如果您想查找父母及其孩子,您可以执行 INNER JOIN

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  INNER JOIN  child
  ON   parent.id     =    child.pid

结果是 parent的每次匹配左表中的 id 和第二个表中的 childpid 将在结果中显示为一行:

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  1 | Alex   |   1  | Kate  |
|  1 | Alex   |   1  | Lia   |
|  3 | Cath   |   3  | Mary  |
+----+--------+------+-------+

现在,上面不显示父母没有孩子(因为他们的 id 与孩子的 id 不匹配) ids,那么你会做什么呢?有三种类型的外连接:左外连接、右外连接和全外连接,因为我们需要左侧的“额外”行。表(父项):

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  LEFT JOIN  child
  ON   parent.id    =    child.pid

结果是除了之前的匹配项之外,还显示所有没有匹配项的父项(即:没有孩子):

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  1 | Alex   |   1  | Kate  |
|  1 | Alex   |   1  | Lia   |
|  3 | Cath   |   3  | Mary  |
|  2 | Bill   | NULL | NULL  |
|  4 | Dale   | NULL | NULL  |
|  5 | Evan   | NULL | NULL  |
+----+--------+------+-------+

所有这些 NULL 是从哪里来的? MySQL(或您可能使用的任何其他 RDBMS)不会知道由于这些父母没有匹配项(孩子),因此没有 pidchild.name 与这些父母匹配,因此,它放置了这个特殊的非。 - 值称为NULL

我的观点是,这些NULL是在LEFT OUTER JOIN期间创建的(在结果集中)。


因此,如果我们只想显示对于没有孩子的父母,我们可以将 WHERE child.pid IS NULL 添加到上面的 LEFT JOIN 中。 WHERE 子句在 JOIN 完成后进行评估(检查)。因此,从上面的结果可以清楚地看出,只会显示 pid 为 NULL 的最后三行:

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  LEFT JOIN  child
  ON   parent.id    =    child.pid

WHERE child.pid IS NULL

结果:

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  2 | Bill   | NULL | NULL  |
|  4 | Dale   | NULL | NULL  |
|  5 | Evan   | NULL | NULL  |
+----+--------+------+-------+

现在,如果我们移动 IS NULL 检查会发生什么从 WHERE 到连接 ON 子句?

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  LEFT JOIN  child
  ON   parent.id    =    child.pid
  AND  child.pid IS NULL

在这种情况下,数据库尝试从两个表中查找与这些条件匹配的行。也就是说,parent.id = child.pid AND child.pid IN NULL 的行。但它找不到没有这样的匹配,因为child.pid不能等于某些东西(1、2、3、4或5)并且同时为NULL!

因此,条件:

ON   parent.id    =    child.pid
AND  child.pid IS NULL

等同于:

ON   1 = 0

始终为 False

那么,为什么它返回左表中的所有行? 因为它是左连接!左连接返回匹配的行(本例中没有)以及左表中不匹配的行 > 检查(在本例中为所有):

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  1 | Alex   | NULL | NULL  |
|  2 | Bill   | NULL | NULL  |
|  3 | Cath   | NULL | NULL  |
|  4 | Dale   | NULL | NULL  |
|  5 | Evan   | NULL | NULL  |
+----+--------+------+-------+

我希望以上解释是清楚的。



旁注(与您的问题没有直接关系):为什么 Pan 没有出现在我们的 JOIN 中?因为他的pidNULL并且在SQL的(不常见的)逻辑中NULL不等于任何东西,所以它不能与任何父id(它们是1、2、3、4 和 5)。即使那里有 NULL,它仍然不匹配,因为 NULL 不等于任何东西,甚至不等于 NULL 本身(这确实是一个非常奇怪的逻辑!)。这就是为什么我们使用特殊检查 IS NULL 而不是 = NULL 检查。

那么,如果我们执行 RIGHT JOINPan 会出现吗?是的,会的!因为 RIGHT JOIN 将显示所有匹配的结果(我们执行的第一个 INNER JOIN)以及 RIGHT 表中不匹配的所有行(在我们的例子中是一个,(NULL, 'Pan')< :

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  RIGHT JOIN  child
  ON   parent.id     =    child.pid

结果:

+------+--------+------+-------+
| id   | parent | pid  | child | 
+---------------+------+-------+
|   1  | Alex   |   1  | Kate  |
|   1  | Alex   |   1  | Lia   |
|   3  | Cath   |   3  | Mary  |
| NULL | NULL   | NULL | Pan   |
+------+--------+------+-------+

不幸的是,MySQL 没有 FULL JOIN ,您可以在其他 RDBMS 中尝试,它会显示

+------+--------+------+-------+
|  id  | parent | pid  | child | 
+------+--------+------+-------+
|   1  | Alex   |   1  | Kate  |
|   1  | Alex   |   1  | Lia   |
|   3  | Cath   |   3  | Mary  |
|   2  | Bill   | NULL | NULL  |
|   4  | Dale   | NULL | NULL  |
|   5  | Evan   | NULL | NULL  |
| NULL | NULL   | NULL | Pan   |
+------+--------+------+-------+

Example with tables A and B:

 A (parent)       B (child)    
============    =============
 id | name        pid | name 
------------    -------------
  1 | Alex         1  | Kate
  2 | Bill         1  | Lia
  3 | Cath         3  | Mary
  4 | Dale       NULL | Pan
  5 | Evan  

If you want to find parents and their kids, you do an INNER JOIN:

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  INNER JOIN  child
  ON   parent.id     =    child.pid

Result is that every match of a parent's id from the left table and a child's pid from the second table will show as a row in the result:

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  1 | Alex   |   1  | Kate  |
|  1 | Alex   |   1  | Lia   |
|  3 | Cath   |   3  | Mary  |
+----+--------+------+-------+

Now, the above does not show parents without kids (because their ids do not have a match in child's ids, so what do you do? You do an outer join instead. There are three types of outer joins, the left, the right and the full outer join. We need the left one as we want the "extra" rows from the left table (parent):

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  LEFT JOIN  child
  ON   parent.id    =    child.pid

Result is that besides previous matches, all parents that do not have a match (read: do not have a kid) are shown too:

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  1 | Alex   |   1  | Kate  |
|  1 | Alex   |   1  | Lia   |
|  3 | Cath   |   3  | Mary  |
|  2 | Bill   | NULL | NULL  |
|  4 | Dale   | NULL | NULL  |
|  5 | Evan   | NULL | NULL  |
+----+--------+------+-------+

Where did all those NULL come from? Well, MySQL (or any other RDBMS you may use) will not know what to put there as these parents have no match (kid), so there is no pid nor child.name to match with those parents. So, it puts this special non-value called NULL.

My point is that these NULLs are created (in the result set) during the LEFT OUTER JOIN.


So, if we want to show only the parents that do NOT have a kid, we can add a WHERE child.pid IS NULL to the LEFT JOIN above. The WHERE clause is evaluated (checked) after the JOIN is done. So, it's clear from the above result that only the last three rows where the pid is NULL will be shown:

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  LEFT JOIN  child
  ON   parent.id    =    child.pid

WHERE child.pid IS NULL

Result:

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  2 | Bill   | NULL | NULL  |
|  4 | Dale   | NULL | NULL  |
|  5 | Evan   | NULL | NULL  |
+----+--------+------+-------+

Now, what happens if we move that IS NULL check from the WHERE to the joining ON clause?

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  LEFT JOIN  child
  ON   parent.id    =    child.pid
  AND  child.pid IS NULL

In this case the database tries to find rows from the two tables that match these conditions. That is, rows where parent.id = child.pid AND child.pid IN NULL. But it can find no such match because no child.pid can be equal to something (1, 2, 3, 4 or 5) and be NULL at the same time!

So, the condition:

ON   parent.id    =    child.pid
AND  child.pid IS NULL

is equivalent to:

ON   1 = 0

which is always False.

So, why does it return ALL rows from the left table? Because it's a LEFT JOIN! And left joins return rows that match (none in this case) and also rows from the left table that do not match the check (all in this case):

+----+--------+------+-------+
| id | parent | pid  | child | 
+----+--------+------+-------+
|  1 | Alex   | NULL | NULL  |
|  2 | Bill   | NULL | NULL  |
|  3 | Cath   | NULL | NULL  |
|  4 | Dale   | NULL | NULL  |
|  5 | Evan   | NULL | NULL  |
+----+--------+------+-------+

I hope the above explanation is clear.



Sidenote (not directly related to your question): Why on earth doesn't Pan show up in none of our JOINs? Because his pid is NULL and NULL in the (not common) logic of SQL is not equal to anything so it can't match with any of the parent ids (which are 1,2,3,4 and 5). Even if there was a NULL there, it still wouldn't match because NULL does not equal anything, not even NULL itself (it's a very strange logic, indeed!). That's why we use the special check IS NULL and not a = NULL check.

So, will Pan show up if we do a RIGHT JOIN ? Yes, it will! Because a RIGHT JOIN will show all results that match (the first INNER JOIN we did) plus all rows from the RIGHT table that don't match (which in our case is one, the (NULL, 'Pan') row.

SELECT id,  parent.name AS parent
     , pid, child.name  AS child

FROM
        parent  RIGHT JOIN  child
  ON   parent.id     =    child.pid

Result:

+------+--------+------+-------+
| id   | parent | pid  | child | 
+---------------+------+-------+
|   1  | Alex   |   1  | Kate  |
|   1  | Alex   |   1  | Lia   |
|   3  | Cath   |   3  | Mary  |
| NULL | NULL   | NULL | Pan   |
+------+--------+------+-------+

Unfortunately, MySQL does not have FULL JOIN. You can try it in other RDBMSs, and it will show:

+------+--------+------+-------+
|  id  | parent | pid  | child | 
+------+--------+------+-------+
|   1  | Alex   |   1  | Kate  |
|   1  | Alex   |   1  | Lia   |
|   3  | Cath   |   3  | Mary  |
|   2  | Bill   | NULL | NULL  |
|   4  | Dale   | NULL | NULL  |
|   5  | Evan   | NULL | NULL  |
| NULL | NULL   | NULL | Pan   |
+------+--------+------+-------+
太阳哥哥 2024-11-26 10:27:54

NULL 部分是在实际连接之后计算的,因此这就是它需要位于 where 子句中的原因。

The NULL part is calculated AFTER the actual join, so that is why it needs to be in the where clause.

半枫 2024-11-26 10:27:54

实际上 NULL 过滤器并没有被忽略。事情是这样的,这就是连接两个表的工作原理。

我将尝试逐步介绍数据库服务器执行的步骤以使其理解。
例如,当您执行您所说的忽略 NULL 条件的查询时。
选择
*

出货量
LEFT OUTER JOIN 返回 r
ON s.id = r.id
并且 r.id 为空
在哪里
s.day >= CURDATE() - INTERVAL 10 DAY

第一件事是在下一步中选择表 SHIPMENTS 中的所有行,

数据库服务器将开始从第二个(RETURNS)表中一一选择记录。

在第三步中,RETURNS 表中的记录将根据您在查询中提供的连接条件进行限定,在本例中为(s.id = r.id 且 r.id 为 NULL),

请注意,此限定仅适用于第三步决定服务器是否应该接受或拒绝 RETURNS 表的当前记录以附加到 SHIPMENT 表的选定行。它绝不会影响从 SHIPMENT 表中选择记录。

一旦服务器完成连接两个包含 SHIPMENT 表的所有行和 RETURNS 表的选定行的表,它就会对中间结果应用 where 子句。
因此,当您将 (r.id is NULL) 条件放入 where 子句时,中间结果中 r.id = null 的所有记录都会被过滤掉。

Actually NULL filter is not being ignored. Thing is this is how joining two tables work.

I will try to walk down with the steps performed by database server to make it understand.
For example when you execute the query which you said is ignoring the NULL condition.
SELECT
*
FROM
shipments s
LEFT OUTER JOIN returns r
ON s.id = r.id
AND r.id is null
WHERE
s.day >= CURDATE() - INTERVAL 10 DAY

1st thing happened is all the rows from table SHIPMENTS get selected

on next step database server will start selecting one by one record from 2nd(RETURNS) table.

on third step the record from RETURNS table will be qualified against the join conditions you have provided in the query which in this case is (s.id = r.id and r.id is NULL)

note that this qualification applied on third step only decides if server should accept or reject the current record of RETURNS table to append with the selected row of SHIPMENT table. It can in no way effect the selection of record from SHIPMENT table.

And once server is done with joining two tables which contains all the rows of SHIPMENT table and selected rows of RETURNS table it applies the where clause on the intermediate result.
so when you put (r.id is NULL) condition in where clause than all the records from the intermediate result with r.id = null gets filtered out.

最美不过初阳 2024-11-26 10:27:54

您正在执行 LEFT OUTTER JOIN ,这表明您想要语句左侧表中的每个元组,无论它在右侧表中是否有匹配的记录。在这种情况下,您的结果将从 RIGHT 表中删除,但最终得到的结果与您在 ON 子句中根本不包含 AND 一样。

在 WHERE 子句中执行 AND 会导致在 LEFT JOIN 发生之后进行修剪。

You're doing a LEFT OUTTER JOIN which indicates that you want every tuple from the table on the LEFT of the statement regardless of it has a matching record in the RIGHT table. This being the case, your results are being pruned from the RIGHT table but you're ending up with the same results as if you didn't include the AND at all within the ON clause.

Performing the AND in the WHERE clause causes the prune to happen after the LEFT JOIN takes place.

°如果伤别离去 2024-11-26 10:27:54

WHERE 子句在处理 JOIN 条件后进行计算。

The WHERE clause is evaluated after the JOIN conditions have been processed.

送舟行 2024-11-26 10:27:54

你的执行计划应该明确这一点; JOIN 优先,然后过滤结果。

Your execution plan should make this clear; the JOIN takes precedence, after which the results are filtered.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文