向外连接添加谓词时出现问题

发布于 2024-10-07 02:09:48 字数 580 浏览 1 评论 0原文

我一直在为此抓狂:

-- misses nulls
SELECT *
FROM BASE_TABLE TAB1
     FULL JOIN BASE_TABLE TAB2
       USING (ANOTHER_ID)
WHERE (TAB1.ID = 6 OR TAB1.ID IS NULL)
  AND (TAB2.ID = 8 OR TAB2.ID IS NULL);

-- catches nulls
SELECT *
FROM (SELECT * FROM BASE_TABLE WHERE ID = 6) TAB1
     FULL JOIN (SELECT * FROM BASE_TABLE WHERE ID = 8) TAB2
       USING (ANOTHER_ID);

第一个查询丢失了一个或另一个表中不存在的行。为什么第一个查询无法执行外连接?

我一直认为我已经明白了 - 首先评估 WHERE 子句,因此稍后不会应用 'OR IS NULL' - 但这对我来说没有意义,因为我已经成功应用了 'IS NULL' 谓词过去选择连接后的行。

出于性能原因,我想让第一个查询工作 - 有人知道问题是什么吗?

I have been tearing my hair out over this:

-- misses nulls
SELECT *
FROM BASE_TABLE TAB1
     FULL JOIN BASE_TABLE TAB2
       USING (ANOTHER_ID)
WHERE (TAB1.ID = 6 OR TAB1.ID IS NULL)
  AND (TAB2.ID = 8 OR TAB2.ID IS NULL);

-- catches nulls
SELECT *
FROM (SELECT * FROM BASE_TABLE WHERE ID = 6) TAB1
     FULL JOIN (SELECT * FROM BASE_TABLE WHERE ID = 8) TAB2
       USING (ANOTHER_ID);

The first query loses rows where the row in one or the other table does not exist. Why does the first query fail to do an outer join?

I keep thinking I've got it - the WHERE clause is evaluated first, so the 'OR IS NULL' is not applied later - but that doesn't make sense to me because I've successfully applied 'IS NULL' predicates in the past to select rows after the join.

I would like to make the first query work, for performance reasons - does anyone know what the problem is?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

柏林苍穹下 2024-10-14 02:09:48

第一个查询执行连接,然后执行过滤器,第二个查询执行过滤器,然后执行连接。
对于外连接,这种区别很重要。

通过一些示例数据您就会理解它。

create table tab1 (id number, another_id number);
create table tab2 (id number, another_id number);

insert into tab1 values (6,5);
insert into tab2 values (8,5);
insert into tab1 values (1,6);
insert into tab2 values (2,6);

SELECT *
FROM TAB1
     FULL JOIN TAB2 USING (ANOTHER_ID);

     ANOTHER_ID              ID              ID
--------------- --------------- ---------------
           5.00            6.00            8.00
           6.00            1.00            2.00

结果集(不带 WHERE)显示 another_id 6 的连接条件已成功。不需要外连接。

当您添加 WHERE 过滤器时,它会过滤掉 6 的匹配项,因为 id 既不是 6,8 也不是 null。也就是说,您将其用作过滤谓词而不是连接谓词。

我相信您的意图是,对于要加入 TAB2 的 TAB1,ANOTHER_ID 上应该有一个匹配,并且 TAB1 的 ID 应为 6,TAB2 的 ID 应为 8。这就是第二个 SQL 中的内容。

连接谓词也可以表示为

SELECT *
FROM TAB1
    FULL JOIN TAB2 ON 
           (TAB1.ANOTHER_ID = TAB2.ANOTHER_ID AND TAB1.ID=6 AND TAB2.ID=8)

The first query does a join then a filter, the second does a filter then a join.
With an outer join the distinction is important.

You'll understand it with some sample data.

create table tab1 (id number, another_id number);
create table tab2 (id number, another_id number);

insert into tab1 values (6,5);
insert into tab2 values (8,5);
insert into tab1 values (1,6);
insert into tab2 values (2,6);

SELECT *
FROM TAB1
     FULL JOIN TAB2 USING (ANOTHER_ID);

     ANOTHER_ID              ID              ID
--------------- --------------- ---------------
           5.00            6.00            8.00
           6.00            1.00            2.00

The result set (without the WHERE) shows that the join condition for another_id 6 has been successful. There is no need for the outer join.

When you add the WHERE filter, it filters out the match for 6 since the ids are neither 6,8 nor null. That is, you have used it as a filter predicate not a join predicate.

I believe what you intended was that for TAB1 to join to TAB2 there should be a match on ANOTHER_ID plus the ID should be 6 for TAB1 and 8 for TAB2. That's what is in the second SQL.

The join predicate can also be expressed as

SELECT *
FROM TAB1
    FULL JOIN TAB2 ON 
           (TAB1.ANOTHER_ID = TAB2.ANOTHER_ID AND TAB1.ID=6 AND TAB2.ID=8)
背叛残局 2024-10-14 02:09:48

嗯。好吧,这是一个脑筋急转弯,但我想我可能有它。

在第一个查询中,您在键 ANOTHER_ID 上将表完全连接到自身。

当两个连接表相同时,无论您执行完全连接、内部连接、左连接还是右连接,结果都是相同的。因为您的密钥 ANOTHER_ID 在两个表中总是存在或不存在。在任何情况下,一个表的 ANOTHER_ID 值在另一个表中都找不到,因此也不存在 TAB1.ANOTHER_ID 或 TAB2.ANOTHER_ID 最终为 NULL 的情况。所以你实际上只是在 ANOTHER_ID 上进行自内连接。

现在我不知道你的ID列的内容,但我想它总是填充一些值。因此,在自内连接之后,生成的 ID 列将始终填充一些内容。也许不是 6 或 8,但也不是 NULL。如果没有 NULL 值,您的 WHERE 查询将转换为 WHERE TAB1.ID = 6 AND TAB2.ID = 8,仅留下正确的组合,而不会留下任何其他内容。

相反,在第二个查询中,您定义 ID=6 和 ID=8 的子集,并将这些子集彼此完全连接。子集 1 包含子集 2 中不存在的某些 ANOTHER_ID,反之亦然。因此,现在可以进行 FULL JOIN,因为某些行不会连接到其他行,从而在 TAB1.ID 或 TAB2.ID 中留下可以检测到的 NULL 值。

我认为您的第一个查询可以通过将 WHERE 子句更改为: WHERE TAB1.ID IN (6,8) AND TAB2.ID IN (6,8) 来调整。然而,这将给出与第二个查询不同的结果,而且我认为还会产生一定数量的重复行。而且我也不认为它会更快。

Hm. It's a brainteaser allright, but I think I might have it.

In your first query, you are full-joining the table to itself on the key ANOTHER_ID.

When two joined tables are the same, it doesn't matter whether you do a full join, inner join, left join or right join: the result is the same. Because your key, ANOTHER_ID, always either exists or doesn't exist in both tables. There are no cases in which one table has a value for ANOTHER_ID that is not found in the other table, and as such there are no cases where either TAB1.ANOTHER_ID or TAB2.ANOTHER_ID would end up NULL. So you are actually just doing a self-inner-join on ANOTHER_ID.

Now I don't know the content of your ID column, but I imagine that it is always filled with some value. So after your self-inner-join, the resulting ID columns will always be filled with something. Perhaps not 6 or 8, but also not NULL. In the absence of NULL values, your WHERE query translates to WHERE TAB1.ID = 6 AND TAB2.ID = 8 which leaves only the correct combinations and nothing else.

In contrast, in your second query, you are defining subsets of ID=6 and ID=8 and full-joining those subsets to one another. Subset 1 contains certain occurrences of ANOTHER_ID not present in subset 2, and vice versa. So now there is ground for a FULL JOIN because certain rows will not be joined to other rows, leaving NULL values in TAB1.ID or TAB2.ID which you can detect.

I think your first query can be adjusted by changing your WHERE clause to: WHERE TAB1.ID IN (6,8) AND TAB2.ID IN (6,8). That will however give a different result as your second query, and I think also a certain amount of duplicate rows. And I also don't think it will be faster.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文