哪个 select 语句性能更好?
假设master_table包含许多记录,并且master_table的“id”字段、tableA、tableB、tableC和tableD在业务意义上是相同的。
对于下面所示的 2 个 select 语句,
它们都会返回相同的结果集吗?
哪一款会有更好的性能呢?
我认为如果tableA_tmp,tableB_tmp,tableC_tmp和tableD_tmp都返回较小的结果集,SQL1将比SQL2更快,因为oracle不需要为每个master_table记录查询一次tableA_tmp,tableB_tmp,tableC_tmp和tableD_tmp。
但是如果tableA_tmp、tableB_tmp、tableC_tmp和tableD_tmp都返回大结果集,SQL 2会快得多,因为连接许多大结果集的成本比为每个master_table记录查询一次tableA_tmp、tableB_tmp、tableC_tmp和tableD_tmp要高得多。
如果我有任何误解,请纠正我。或者有其他建议的方法吗?
SQL1:
select
master_table.* ,
tableA_tmp.cnt as tableA_cnt ,
tableB_tmp.cnt as tableB_cnt ,
tableC_tmp.cnt as tableC_cnt ,
tableD_tmp.cnt as tableD_cnt
from
master_table,
(select tableA.id as id, count(1) as cnt from tableA group by tableA.id) tableA_tmp,
(select tableB.id as id, count(1) as cnt from tableB group by tableB.id) tableB_tmp,
(select tableC.id as id, count(1) as cnt from tableC group by tableC.id) tableC_tmp,
(select tableD.id as id, count(1) as cnt from tableD group by tableD.id) tableD_tmp
where
master_table.id = tableA_tmp.id(+) and
master_table.id = tableB_tmp.id(+) and
master_table.id = tableC_tmp.id(+) and
master_table.id = tableD_tmp.id(+) ;
SQL 2:
select
master_table.* ,
(select count(*) from tableA where tableA.id = master_table.id) as tableA_cnt,
(select count(*) from tableB where tableB.id = master_table.id) as tableB_cnt,
(select count(*) from tableC where tableC.id = master_table.id) as tableC_cnt,
(select count(*) from tableD where tableD.id = master_table.id) as tableD_cnt
from
master_table;
Suppose master_table contains many records and both the "id" field of the master_table, tableA,tableB,tableC and tableD are the same in the business sense.
For the 2 select statements shown belows ,
Will they both return the same result set?
Which one will have better performance ?
I think if both tableA_tmp ,tableB_tmp,tableC_tmp and tableD_tmp return a smaller result set , SQL1 will be faster than SQL2 because oracle does not need to query tableA_tmp,,tableB_tmp,tableC_tmp and tableD_tmp once for every master_table record.
But if both the tableA_tmp ,tableB_tmp,tableC_tmp and tableD_tmp return the large result set , SQL 2 will be much faster because the cost of joining many large result set is much higher than query tableA_tmp,,tableB_tmp,tableC_tmp and tableD_tmp once for every master_table record.
Please correct me if I have any misunderstanding. Or any others method suggested?
SQL1:
select
master_table.* ,
tableA_tmp.cnt as tableA_cnt ,
tableB_tmp.cnt as tableB_cnt ,
tableC_tmp.cnt as tableC_cnt ,
tableD_tmp.cnt as tableD_cnt
from
master_table,
(select tableA.id as id, count(1) as cnt from tableA group by tableA.id) tableA_tmp,
(select tableB.id as id, count(1) as cnt from tableB group by tableB.id) tableB_tmp,
(select tableC.id as id, count(1) as cnt from tableC group by tableC.id) tableC_tmp,
(select tableD.id as id, count(1) as cnt from tableD group by tableD.id) tableD_tmp
where
master_table.id = tableA_tmp.id(+) and
master_table.id = tableB_tmp.id(+) and
master_table.id = tableC_tmp.id(+) and
master_table.id = tableD_tmp.id(+) ;
SQL 2:
select
master_table.* ,
(select count(*) from tableA where tableA.id = master_table.id) as tableA_cnt,
(select count(*) from tableB where tableB.id = master_table.id) as tableB_cnt,
(select count(*) from tableC where tableC.id = master_table.id) as tableC_cnt,
(select count(*) from tableD where tableD.id = master_table.id) as tableD_cnt
from
master_table;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
连接通常比内联查询更好 - 对主查询返回的每一行执行内联查询。
这意味着(1)比(2)更好。至少在 99% 的情况下是这样。
在少数情况下,数据的分布和索引的定义方式可以在使查询执行时间向 2 倾斜(效率更高)方面发挥作用,但这种情况在普通数据库中很少发生。
Joins are generally better than inline queries - inline queries get executed for every row that is returned from the main query.
That means (1) is better than (2). In 99% of the cases at least.
In few cases, the distribution of data and way indexes are defined can play a role in tilting the query execution times towards 2 being more efficient, but this happens very rarely in a average database.
查看执行计划
Check out the execution plan
在较短的时间内完成工作的人。
The one who carries out work in smaller period of time.
不同的结果集。
考虑 TableA 是否为空。
在语句 1 中,tableA_tmp 也将为空。使用外连接,将返回一行,但 tableA_tmp.cnt 的值为 null。
在语句 2 中,将执行计数并返回零值。
两者的性能可能更好或更差,具体取决于卷、索引、标量子查询缓存、表聚类因子、内存....
Different result sets.
Consider if TableA is empty.
In statement 1, tableA_tmp would also be empty. With the outer join, a row would be returned but the value of tableA_tmp.cnt would be null.
In statement 2, the count would be executed and return a zero value.
Performance of either could be better or worse depending on volumes, indexes, scalar subquery caching, table clustering factor, memory....
如果Master表和其他表足够大,SQL 1会更快。原因是表扫描。表扫描的成本很高,因为扫描涉及 I/O。第一个查询仅需要对表 tableA、tableB、tablec 和 tableD 中的每个表进行一次扫描。
在 SQL2 中,对于主表中的每个键,将分别扫描表 tableA、tableB、tablec 和 tableD。如果主表有10行,则表tableA将被扫描10次,tableB将被扫描10次,tableC将被扫描10次,依此类推。成本会很高。
例子将说明这一点。假设每个表有 1000 条记录,每条记录都有不同的 id。在第一个查询中,表 tableA、tableB、tablec 和 tableD 将分别扫描一次,并将结果连接起来。生成的行数将会很大,但会在合理的时间内进行过滤。在SQL2中,每个表tableA、tableB、tablec和tableD都会被扫描1000次。那是非常昂贵的。
If Master table and other Tables are sufficiently large, SQL 1 will be faster. The reason is table scans. Table scans are expensive as a scan involves I/O. The first query requires only single scan of each of tables tableA, tableB, tablec and tableD.
In SQL2, for every key in master table the tables tableA, tableB, tablec and tableD will be scanned sperately. If master table has 10 rows, table tableA will be scanned 10 times, tableB will be scanned 10 times, tableC will be scanned 10 times and so on. The cost will be high.
Example will illustrate the point. Suppose every table has 1000 records each with distinct id. In first query, the tables tableA, tableB, tablec and tableD will be scanned once each and result will be joined. The resulting number of rows will be large but will be filtered in reasonable time. In SQL2, each table tableA, tableB, tablec and tableD will be scanned 1000 times each. That is very expensive.