需要帮助理解 JOIN 查询与带有子选择的查询的 SQL 解释

发布于 2024-10-12 11:24:48 字数 3131 浏览 3 评论 0原文

我在这里发布了一个上一个问题,询问 JOIN 查询或使用子选择的查询哪个更好。链接:查询中的查询:有更好的方法吗?

这是该问题的延伸。有人可以向我解释为什么我会看到我在这里看到的东西吗?

查询(子选择):

SELECT article_seq, title, synopsis, body, lastmodified_date, (SELECT type_id FROM types WHERE kbarticles.type = type_seq), status, scope, images, archived, author, owner, (SELECT owner_description FROM owners WHERE kbarticles.owner = owner_seq),  (SELECT review_date FROM kbreview WHERE kbarticles.article_seq = article_seq) FROM kbarticles WHERE article_seq = $1

解释分析(子选择)

QUERY PLAN

Index Scan using article_seq_pkey on kbarticles  (cost=0.00..32.24 rows=1 width=1241) (actual time=1.421..1.426 rows=1 loops=1)

  Index Cond: (article_seq = 1511)

  SubPlan

    ->  Seq Scan on kbreview  (cost=0.00..14.54 rows=1 width=8) (actual time=0.243..1.158 rows=1 loops=1)

          Filter: ($2 = article_seq)

    ->  Seq Scan on owners  (cost=0.00..1.16 rows=1 width=24) (actual time=0.073..0.078 rows=1 loops=1)

          Filter: ($1 = owner_seq)

    ->  Index Scan using types_type_seq_key on types  (cost=0.00..8.27 rows=1 width=24) (actual time=0.044..0.050 rows=1 loops=1)

          Index Cond: ($0 = type_seq)

Total runtime: 2.051 ms

查询(JOINs)

SELECT k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status, k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date FROM kbarticles k JOIN types t ON k.type = t.type_seq JOIN owners o ON k.owner = o.owner_seq JOIN kbreview r ON k.article_seq = r.article_seq WHERE k.article_seq = $1

解释分析(< code>JOINs)

QUERY PLAN

Nested Loop  (cost=0.00..32.39 rows=1 width=1293) (actual time=0.532..1.467 rows=1 loops=1)

  Join Filter: (k.owner = o.owner_seq)

  ->  Nested Loop  (cost=0.00..31.10 rows=1 width=1269) (actual time=0.419..1.345 rows=1 loops=1)

        ->  Nested Loop  (cost=0.00..22.82 rows=1 width=1249) (actual time=0.361..1.277 rows=1 loops=1)

              ->  Index Scan using article_seq_pkey on kbarticles k  (cost=0.00..8.27 rows=1 width=1241) (actual time=0.065..0.071 rows=1 loops=1)

                    Index Cond: (article_seq = 1511)

              ->  Seq Scan on kbreview r  (cost=0.00..14.54 rows=1 width=12) (actual time=0.267..1.175 rows=1 loops=1)

                    Filter: (r.article_seq = 1511)

        ->  Index Scan using types_type_seq_key on types t  (cost=0.00..8.27 rows=1 width=28) (actual time=0.048..0.055 rows=1 loops=1)

              Index Cond: (t.type_seq = k.type)

  ->  Seq Scan on owners o  (cost=0.00..1.13 rows=13 width=28) (actual time=0.022..0.038 rows=13 loops=1)

Total runtime: 2.256 ms

根据我在上一个问题中给出(并接受)的答案,JOINs 应该被证明具有更好的结果。然而,在我的所有测试中,我发现 JOIN 的结果差了几毫秒。 JOIN 似乎也充满了嵌套循环。我JOIN的所有表都已建立索引。

我是否在做一些我应该做的不同的事情?我有什么遗漏的吗?

I posted a previous question here asking about what was better, JOIN queries or queries using subselects. Link: Queries within queries: Is there a better way?

This is an extension to that question. Can somebody explain to me why I'm seeing what I'm seeing here?

Query (Subselects):

SELECT article_seq, title, synopsis, body, lastmodified_date, (SELECT type_id FROM types WHERE kbarticles.type = type_seq), status, scope, images, archived, author, owner, (SELECT owner_description FROM owners WHERE kbarticles.owner = owner_seq),  (SELECT review_date FROM kbreview WHERE kbarticles.article_seq = article_seq) FROM kbarticles WHERE article_seq = $1

Explain Analyze (Subselects)

QUERY PLAN

Index Scan using article_seq_pkey on kbarticles  (cost=0.00..32.24 rows=1 width=1241) (actual time=1.421..1.426 rows=1 loops=1)

  Index Cond: (article_seq = 1511)

  SubPlan

    ->  Seq Scan on kbreview  (cost=0.00..14.54 rows=1 width=8) (actual time=0.243..1.158 rows=1 loops=1)

          Filter: ($2 = article_seq)

    ->  Seq Scan on owners  (cost=0.00..1.16 rows=1 width=24) (actual time=0.073..0.078 rows=1 loops=1)

          Filter: ($1 = owner_seq)

    ->  Index Scan using types_type_seq_key on types  (cost=0.00..8.27 rows=1 width=24) (actual time=0.044..0.050 rows=1 loops=1)

          Index Cond: ($0 = type_seq)

Total runtime: 2.051 ms

Query (JOINs)

SELECT k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status, k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date FROM kbarticles k JOIN types t ON k.type = t.type_seq JOIN owners o ON k.owner = o.owner_seq JOIN kbreview r ON k.article_seq = r.article_seq WHERE k.article_seq = $1

Explain Analyze (JOINs)

QUERY PLAN

Nested Loop  (cost=0.00..32.39 rows=1 width=1293) (actual time=0.532..1.467 rows=1 loops=1)

  Join Filter: (k.owner = o.owner_seq)

  ->  Nested Loop  (cost=0.00..31.10 rows=1 width=1269) (actual time=0.419..1.345 rows=1 loops=1)

        ->  Nested Loop  (cost=0.00..22.82 rows=1 width=1249) (actual time=0.361..1.277 rows=1 loops=1)

              ->  Index Scan using article_seq_pkey on kbarticles k  (cost=0.00..8.27 rows=1 width=1241) (actual time=0.065..0.071 rows=1 loops=1)

                    Index Cond: (article_seq = 1511)

              ->  Seq Scan on kbreview r  (cost=0.00..14.54 rows=1 width=12) (actual time=0.267..1.175 rows=1 loops=1)

                    Filter: (r.article_seq = 1511)

        ->  Index Scan using types_type_seq_key on types t  (cost=0.00..8.27 rows=1 width=28) (actual time=0.048..0.055 rows=1 loops=1)

              Index Cond: (t.type_seq = k.type)

  ->  Seq Scan on owners o  (cost=0.00..1.13 rows=13 width=28) (actual time=0.022..0.038 rows=13 loops=1)

Total runtime: 2.256 ms

Based on the answers given (and accepted) in my previous question, JOINs should prove to have better results. However, in all my tests, I'm seeing JOINs to have worse results by a few milliseconds. It also seems like the JOINs are riddled with nested loops. All the tables I'm JOINing are indexed.

Am I doing something that I should be doing differently? Is there something I'm missing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

剪不断理还乱 2024-10-19 11:24:48

这些查询在逻辑上是不同的。

第一个:

SELECT  article_seq, title, synopsis, body, lastmodified_date,
        (
        SELECT  type_id
        FROM    types
        WHERE   kbarticles.type = type_seq
        ),
        status, scope, images, archived, author, owner,
        (
        SELECT  owner_description
        FROM    owners
        WHERE   kbarticles.owner = owner_seq
        ),
        (
        SELECT  review_date
        FROM    kbreview
        WHERE   kbarticles.article_seq = article_seq
        )
FROM    kbarticles
WHERE   article_seq = $1

第二个:

SELECT  k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status,
        k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date
FROM    kbarticles k
JOIN    types t
ON      k.type = t.type_seq
JOIN    owners o
ON      k.owner = o.owner_seq
JOIN    kbreview r
ON      k.article_seq = r.article_seq
WHERE   k.article_seq = $1

如果typesownerskbreview中有多个记录,则第一个查询将失败,而第二个查询将失败。第二个将从 kbarticles 返回重复项。

如果 kbarticle 没有 typesownerskbreviews,则第一个查询将返回 NULL 在适当的字段中,而第二个将忽略该记录。

如果*_seq字段看起来是PRIMARY KEY字段,则永远不会有重复,查询也永远不会失败;如果 kbarticles 受到对 typesownerskbreview< 的 FOREIGN KEY 引用的约束,则以同样的方式/code>,不能有缺失的行。

但是,JOIN 运算符为优化器提供了更多空间:它可以使任何表领先并使用更高级的 JOIN 技术,例如 HASH JOIN如果您使用子查询,则 MERGE JOIN 不可用。

These queries are logically different.

The first one:

SELECT  article_seq, title, synopsis, body, lastmodified_date,
        (
        SELECT  type_id
        FROM    types
        WHERE   kbarticles.type = type_seq
        ),
        status, scope, images, archived, author, owner,
        (
        SELECT  owner_description
        FROM    owners
        WHERE   kbarticles.owner = owner_seq
        ),
        (
        SELECT  review_date
        FROM    kbreview
        WHERE   kbarticles.article_seq = article_seq
        )
FROM    kbarticles
WHERE   article_seq = $1

The second one:

SELECT  k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status,
        k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date
FROM    kbarticles k
JOIN    types t
ON      k.type = t.type_seq
JOIN    owners o
ON      k.owner = o.owner_seq
JOIN    kbreview r
ON      k.article_seq = r.article_seq
WHERE   k.article_seq = $1

If there is more than one record in types, owners or kbreview, the first query will fail while the second one will return duplicates from kbarticles.

If there is no types, owners or kbreviews for a kbarticle, the first query will return a NULL in appropriate field, while the second one will just omit that record.

If the *_seq fields seem to be the PRIMARY KEY fields, there will never be duplicates and the query will never fail; in the same way if kbarticles is constrained with FOREIGN KEY references to types, owners or kbreview, there can be no missing rows.

However, JOIN operators give the optimizer more place: it can make any table leading and use more advanced JOIN techniques like HASH JOIN or MERGE JOIN which are not available if you are using subqueries.

泪是无色的血 2024-10-19 11:24:48

该表列是否已建立索引? r.article_seq

-> kbreview r 上的顺序扫描(成本=0.00..14.54 行=1 宽度=12)
(实际时间=0.267..1.175行=1
循环=1)

这是花费最多时间的地方。

Is this table column indexed? r.article_seq

-> Seq Scan on kbreview r (cost=0.00..14.54 rows=1 width=12)
(actual time=0.267..1.175 rows=1
loops=1)

This is where most time is spend.

挽心 2024-10-19 11:24:48

鉴于这两个计划都执行相同的表扫描,只是以不同的方式排列,我想说两者之间没有显着差异。下臂生成单行的“嵌套循环”与单行子选择几乎相同。

连接更为通用,因为使用标量子选择不会扩展到从任何辅助表中获取两列。

Given that both plans are doing the same table scans, just arranged in a different way, I'd say there's no significant difference between the two. A "nested loop" where the lower arm produces a single row is pretty much the same as a single-row subselect.

Joins are more general, since using scalar subselects won't extend to getting two columns from any of those auxiliary tables, for example.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文