向外连接添加谓词时出现问题
我一直在为此抓狂:
-- misses nulls
SELECT *
FROM BASE_TABLE TAB1
FULL JOIN BASE_TABLE TAB2
USING (ANOTHER_ID)
WHERE (TAB1.ID = 6 OR TAB1.ID IS NULL)
AND (TAB2.ID = 8 OR TAB2.ID IS NULL);
-- catches nulls
SELECT *
FROM (SELECT * FROM BASE_TABLE WHERE ID = 6) TAB1
FULL JOIN (SELECT * FROM BASE_TABLE WHERE ID = 8) TAB2
USING (ANOTHER_ID);
第一个查询丢失了一个或另一个表中不存在的行。为什么第一个查询无法执行外连接?
我一直认为我已经明白了 - 首先评估 WHERE 子句,因此稍后不会应用 'OR IS NULL' - 但这对我来说没有意义,因为我已经成功应用了 'IS NULL' 谓词过去选择连接后的行。
出于性能原因,我想让第一个查询工作 - 有人知道问题是什么吗?
I have been tearing my hair out over this:
-- misses nulls
SELECT *
FROM BASE_TABLE TAB1
FULL JOIN BASE_TABLE TAB2
USING (ANOTHER_ID)
WHERE (TAB1.ID = 6 OR TAB1.ID IS NULL)
AND (TAB2.ID = 8 OR TAB2.ID IS NULL);
-- catches nulls
SELECT *
FROM (SELECT * FROM BASE_TABLE WHERE ID = 6) TAB1
FULL JOIN (SELECT * FROM BASE_TABLE WHERE ID = 8) TAB2
USING (ANOTHER_ID);
The first query loses rows where the row in one or the other table does not exist. Why does the first query fail to do an outer join?
I keep thinking I've got it - the WHERE clause is evaluated first, so the 'OR IS NULL' is not applied later - but that doesn't make sense to me because I've successfully applied 'IS NULL' predicates in the past to select rows after the join.
I would like to make the first query work, for performance reasons - does anyone know what the problem is?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
第一个查询执行连接,然后执行过滤器,第二个查询执行过滤器,然后执行连接。
对于外连接,这种区别很重要。
通过一些示例数据您就会理解它。
结果集(不带 WHERE)显示 another_id 6 的连接条件已成功。不需要外连接。
当您添加 WHERE 过滤器时,它会过滤掉 6 的匹配项,因为 id 既不是 6,8 也不是 null。也就是说,您将其用作过滤谓词而不是连接谓词。
我相信您的意图是,对于要加入 TAB2 的 TAB1,ANOTHER_ID 上应该有一个匹配,并且 TAB1 的 ID 应为 6,TAB2 的 ID 应为 8。这就是第二个 SQL 中的内容。
连接谓词也可以表示为
The first query does a join then a filter, the second does a filter then a join.
With an outer join the distinction is important.
You'll understand it with some sample data.
The result set (without the WHERE) shows that the join condition for another_id 6 has been successful. There is no need for the outer join.
When you add the WHERE filter, it filters out the match for 6 since the ids are neither 6,8 nor null. That is, you have used it as a filter predicate not a join predicate.
I believe what you intended was that for TAB1 to join to TAB2 there should be a match on ANOTHER_ID plus the ID should be 6 for TAB1 and 8 for TAB2. That's what is in the second SQL.
The join predicate can also be expressed as
嗯。好吧,这是一个脑筋急转弯,但我想我可能有它。
在第一个查询中,您在键 ANOTHER_ID 上将表完全连接到自身。
当两个连接表相同时,无论您执行完全连接、内部连接、左连接还是右连接,结果都是相同的。因为您的密钥 ANOTHER_ID 在两个表中总是存在或不存在。在任何情况下,一个表的 ANOTHER_ID 值在另一个表中都找不到,因此也不存在 TAB1.ANOTHER_ID 或 TAB2.ANOTHER_ID 最终为 NULL 的情况。所以你实际上只是在 ANOTHER_ID 上进行自内连接。
现在我不知道你的ID列的内容,但我想它总是填充一些值。因此,在自内连接之后,生成的 ID 列将始终填充一些内容。也许不是 6 或 8,但也不是 NULL。如果没有 NULL 值,您的 WHERE 查询将转换为
WHERE TAB1.ID = 6 AND TAB2.ID = 8
,仅留下正确的组合,而不会留下任何其他内容。相反,在第二个查询中,您定义 ID=6 和 ID=8 的子集,并将这些子集彼此完全连接。子集 1 包含子集 2 中不存在的某些 ANOTHER_ID,反之亦然。因此,现在可以进行 FULL JOIN,因为某些行不会连接到其他行,从而在 TAB1.ID 或 TAB2.ID 中留下可以检测到的 NULL 值。
我认为您的第一个查询可以通过将 WHERE 子句更改为:
WHERE TAB1.ID IN (6,8) AND TAB2.ID IN (6,8)
来调整。然而,这将给出与第二个查询不同的结果,而且我认为还会产生一定数量的重复行。而且我也不认为它会更快。Hm. It's a brainteaser allright, but I think I might have it.
In your first query, you are full-joining the table to itself on the key ANOTHER_ID.
When two joined tables are the same, it doesn't matter whether you do a full join, inner join, left join or right join: the result is the same. Because your key, ANOTHER_ID, always either exists or doesn't exist in both tables. There are no cases in which one table has a value for ANOTHER_ID that is not found in the other table, and as such there are no cases where either TAB1.ANOTHER_ID or TAB2.ANOTHER_ID would end up NULL. So you are actually just doing a self-inner-join on ANOTHER_ID.
Now I don't know the content of your ID column, but I imagine that it is always filled with some value. So after your self-inner-join, the resulting ID columns will always be filled with something. Perhaps not 6 or 8, but also not NULL. In the absence of NULL values, your WHERE query translates to
WHERE TAB1.ID = 6 AND TAB2.ID = 8
which leaves only the correct combinations and nothing else.In contrast, in your second query, you are defining subsets of ID=6 and ID=8 and full-joining those subsets to one another. Subset 1 contains certain occurrences of ANOTHER_ID not present in subset 2, and vice versa. So now there is ground for a FULL JOIN because certain rows will not be joined to other rows, leaving NULL values in TAB1.ID or TAB2.ID which you can detect.
I think your first query can be adjusted by changing your WHERE clause to:
WHERE TAB1.ID IN (6,8) AND TAB2.ID IN (6,8)
. That will however give a different result as your second query, and I think also a certain amount of duplicate rows. And I also don't think it will be faster.