需要帮助理解 JOIN 查询与带有子选择的查询的 SQL 解释
我在这里发布了一个上一个问题,询问 JOIN
查询或使用子选择的查询哪个更好。链接:查询中的查询:有更好的方法吗?
这是该问题的延伸。有人可以向我解释为什么我会看到我在这里看到的东西吗?
查询(子选择):
SELECT article_seq, title, synopsis, body, lastmodified_date, (SELECT type_id FROM types WHERE kbarticles.type = type_seq), status, scope, images, archived, author, owner, (SELECT owner_description FROM owners WHERE kbarticles.owner = owner_seq), (SELECT review_date FROM kbreview WHERE kbarticles.article_seq = article_seq) FROM kbarticles WHERE article_seq = $1
解释分析(子选择)
QUERY PLAN
Index Scan using article_seq_pkey on kbarticles (cost=0.00..32.24 rows=1 width=1241) (actual time=1.421..1.426 rows=1 loops=1)
Index Cond: (article_seq = 1511)
SubPlan
-> Seq Scan on kbreview (cost=0.00..14.54 rows=1 width=8) (actual time=0.243..1.158 rows=1 loops=1)
Filter: ($2 = article_seq)
-> Seq Scan on owners (cost=0.00..1.16 rows=1 width=24) (actual time=0.073..0.078 rows=1 loops=1)
Filter: ($1 = owner_seq)
-> Index Scan using types_type_seq_key on types (cost=0.00..8.27 rows=1 width=24) (actual time=0.044..0.050 rows=1 loops=1)
Index Cond: ($0 = type_seq)
Total runtime: 2.051 ms
查询(JOIN
s)
SELECT k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status, k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date FROM kbarticles k JOIN types t ON k.type = t.type_seq JOIN owners o ON k.owner = o.owner_seq JOIN kbreview r ON k.article_seq = r.article_seq WHERE k.article_seq = $1
解释分析(< code>JOINs)
QUERY PLAN
Nested Loop (cost=0.00..32.39 rows=1 width=1293) (actual time=0.532..1.467 rows=1 loops=1)
Join Filter: (k.owner = o.owner_seq)
-> Nested Loop (cost=0.00..31.10 rows=1 width=1269) (actual time=0.419..1.345 rows=1 loops=1)
-> Nested Loop (cost=0.00..22.82 rows=1 width=1249) (actual time=0.361..1.277 rows=1 loops=1)
-> Index Scan using article_seq_pkey on kbarticles k (cost=0.00..8.27 rows=1 width=1241) (actual time=0.065..0.071 rows=1 loops=1)
Index Cond: (article_seq = 1511)
-> Seq Scan on kbreview r (cost=0.00..14.54 rows=1 width=12) (actual time=0.267..1.175 rows=1 loops=1)
Filter: (r.article_seq = 1511)
-> Index Scan using types_type_seq_key on types t (cost=0.00..8.27 rows=1 width=28) (actual time=0.048..0.055 rows=1 loops=1)
Index Cond: (t.type_seq = k.type)
-> Seq Scan on owners o (cost=0.00..1.13 rows=13 width=28) (actual time=0.022..0.038 rows=13 loops=1)
Total runtime: 2.256 ms
根据我在上一个问题中给出(并接受)的答案,JOIN
s 应该被证明具有更好的结果。然而,在我的所有测试中,我发现 JOIN
的结果差了几毫秒。 JOIN 似乎也充满了嵌套循环。我JOIN
的所有表都已建立索引。
我是否在做一些我应该做的不同的事情?我有什么遗漏的吗?
I posted a previous question here asking about what was better, JOIN
queries or queries using subselects. Link: Queries within queries: Is there a better way?
This is an extension to that question. Can somebody explain to me why I'm seeing what I'm seeing here?
Query (Subselects):
SELECT article_seq, title, synopsis, body, lastmodified_date, (SELECT type_id FROM types WHERE kbarticles.type = type_seq), status, scope, images, archived, author, owner, (SELECT owner_description FROM owners WHERE kbarticles.owner = owner_seq), (SELECT review_date FROM kbreview WHERE kbarticles.article_seq = article_seq) FROM kbarticles WHERE article_seq = $1
Explain Analyze (Subselects)
QUERY PLAN
Index Scan using article_seq_pkey on kbarticles (cost=0.00..32.24 rows=1 width=1241) (actual time=1.421..1.426 rows=1 loops=1)
Index Cond: (article_seq = 1511)
SubPlan
-> Seq Scan on kbreview (cost=0.00..14.54 rows=1 width=8) (actual time=0.243..1.158 rows=1 loops=1)
Filter: ($2 = article_seq)
-> Seq Scan on owners (cost=0.00..1.16 rows=1 width=24) (actual time=0.073..0.078 rows=1 loops=1)
Filter: ($1 = owner_seq)
-> Index Scan using types_type_seq_key on types (cost=0.00..8.27 rows=1 width=24) (actual time=0.044..0.050 rows=1 loops=1)
Index Cond: ($0 = type_seq)
Total runtime: 2.051 ms
Query (JOIN
s)
SELECT k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status, k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date FROM kbarticles k JOIN types t ON k.type = t.type_seq JOIN owners o ON k.owner = o.owner_seq JOIN kbreview r ON k.article_seq = r.article_seq WHERE k.article_seq = $1
Explain Analyze (JOIN
s)
QUERY PLAN
Nested Loop (cost=0.00..32.39 rows=1 width=1293) (actual time=0.532..1.467 rows=1 loops=1)
Join Filter: (k.owner = o.owner_seq)
-> Nested Loop (cost=0.00..31.10 rows=1 width=1269) (actual time=0.419..1.345 rows=1 loops=1)
-> Nested Loop (cost=0.00..22.82 rows=1 width=1249) (actual time=0.361..1.277 rows=1 loops=1)
-> Index Scan using article_seq_pkey on kbarticles k (cost=0.00..8.27 rows=1 width=1241) (actual time=0.065..0.071 rows=1 loops=1)
Index Cond: (article_seq = 1511)
-> Seq Scan on kbreview r (cost=0.00..14.54 rows=1 width=12) (actual time=0.267..1.175 rows=1 loops=1)
Filter: (r.article_seq = 1511)
-> Index Scan using types_type_seq_key on types t (cost=0.00..8.27 rows=1 width=28) (actual time=0.048..0.055 rows=1 loops=1)
Index Cond: (t.type_seq = k.type)
-> Seq Scan on owners o (cost=0.00..1.13 rows=13 width=28) (actual time=0.022..0.038 rows=13 loops=1)
Total runtime: 2.256 ms
Based on the answers given (and accepted) in my previous question, JOIN
s should prove to have better results. However, in all my tests, I'm seeing JOIN
s to have worse results by a few milliseconds. It also seems like the JOIN
s are riddled with nested loops. All the tables I'm JOIN
ing are indexed.
Am I doing something that I should be doing differently? Is there something I'm missing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这些查询在逻辑上是不同的。
第一个:
第二个:
如果
types
、owners
或kbreview
中有多个记录,则第一个查询将失败,而第二个查询将失败。第二个将从kbarticles
返回重复项。如果
kbarticle
没有types
、owners
或kbreviews
,则第一个查询将返回NULL
在适当的字段中,而第二个将忽略该记录。如果
*_seq
字段看起来是PRIMARY KEY
字段,则永远不会有重复,查询也永远不会失败;如果kbarticles
受到对types
、owners
或kbreview< 的
FOREIGN KEY
引用的约束,则以同样的方式/code>,不能有缺失的行。但是,
JOIN
运算符为优化器提供了更多空间:它可以使任何表领先并使用更高级的JOIN
技术,例如HASH JOIN
或如果您使用子查询,则 MERGE JOIN
不可用。These queries are logically different.
The first one:
The second one:
If there is more than one record in
types
,owners
orkbreview
, the first query will fail while the second one will return duplicates fromkbarticles
.If there is no
types
,owners
orkbreviews
for akbarticle
, the first query will return aNULL
in appropriate field, while the second one will just omit that record.If the
*_seq
fields seem to be thePRIMARY KEY
fields, there will never be duplicates and the query will never fail; in the same way ifkbarticles
is constrained withFOREIGN KEY
references totypes
,owners
orkbreview
, there can be no missing rows.However,
JOIN
operators give the optimizer more place: it can make any table leading and use more advancedJOIN
techniques likeHASH JOIN
orMERGE JOIN
which are not available if you are using subqueries.该表列是否已建立索引? r.article_seq
这是花费最多时间的地方。
Is this table column indexed? r.article_seq
This is where most time is spend.
鉴于这两个计划都执行相同的表扫描,只是以不同的方式排列,我想说两者之间没有显着差异。下臂生成单行的“嵌套循环”与单行子选择几乎相同。
连接更为通用,因为使用标量子选择不会扩展到从任何辅助表中获取两列。
Given that both plans are doing the same table scans, just arranged in a different way, I'd say there's no significant difference between the two. A "nested loop" where the lower arm produces a single row is pretty much the same as a single-row subselect.
Joins are more general, since using scalar subselects won't extend to getting two columns from any of those auxiliary tables, for example.