性能:rank() 与子查询。子查询成本更低?
受到 这个问题的启发,我决定测试 rank( )
函数,尝试查看子查询的效率是否低于排名。所以我创建了一个表:
create table teste_rank ( codigo number(7), data_mov date, valor number(14,2) );
alter table teste_rank add constraint tst_rnk_pk primary key ( codigo, data_mov );
并插入了一些记录...
declare
vdata date;
begin
dbms_random.initialize(120401);
vdata := to_date('04011997','DDMMYYYY');
for reg in 1 .. 465 loop
vdata := to_date('04011997','DDMMYYYY');
while vdata <= trunc(sysdate) loop
insert into teste_rank
(codigo, data_mov, valor)
values
(reg, vdata, dbms_random.value(1,150000));
vdata := vdata + 2;
end loop;
commit;
end loop;
end;
/
然后测试了两个查询:
select *
from teste_rank r
where r.data_mov = ( select max(data_mov)
from teste_rank
where data_mov <= trunc(sysdate)
and codigo = 1 )
and r.codigo = 1;
select *
from ( select rank() over ( partition by codigo order by data_mov desc ) rn, t.*
from teste_rank t
where codigo = 1
and data_mov <= trunc(sysdate) ) r
where r.rn = 1;
正如你所看到的,子查询的成本低于rank()。这是对的吗?我在那里错过了什么吗?
PS:还使用表中的完整查询和低成本的子查询进行了测试。
编辑
我生成了两个查询的 tkprof(跟踪一个,关闭数据库,启动并跟踪第二个)。
For subquery
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.02 3 5 0 0
Execute 1 0.00 0.00 0 3 0 0
Fetch 2 0.00 0.00 1 4 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.02 4 12 0 1
For rank()
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.02 3 3 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 9 19 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.03 12 22 0 1
我可以得出结论,子查询并不总是比Rank 效率低吗?什么时候指示排名而不是子查询?
Inspired by this question I decided to test the rank()
function, trying to see if sub query's are less efficient than rank. So I created a table:
create table teste_rank ( codigo number(7), data_mov date, valor number(14,2) );
alter table teste_rank add constraint tst_rnk_pk primary key ( codigo, data_mov );
and inserted some records...
declare
vdata date;
begin
dbms_random.initialize(120401);
vdata := to_date('04011997','DDMMYYYY');
for reg in 1 .. 465 loop
vdata := to_date('04011997','DDMMYYYY');
while vdata <= trunc(sysdate) loop
insert into teste_rank
(codigo, data_mov, valor)
values
(reg, vdata, dbms_random.value(1,150000));
vdata := vdata + 2;
end loop;
commit;
end loop;
end;
/
And then tested two querys:
select *
from teste_rank r
where r.data_mov = ( select max(data_mov)
from teste_rank
where data_mov <= trunc(sysdate)
and codigo = 1 )
and r.codigo = 1;
select *
from ( select rank() over ( partition by codigo order by data_mov desc ) rn, t.*
from teste_rank t
where codigo = 1
and data_mov <= trunc(sysdate) ) r
where r.rn = 1;
As you can see, the cost of sub query is lower than rank(). Is this right? Am I missing something there?
PS: Tested also with a full query in the table and still sub query with the low cost.
EDIT
I generated a tkprof of the two query's (traced one, shutdown the database, startup and traced the second).
For subquery
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.02 3 5 0 0
Execute 1 0.00 0.00 0 3 0 0
Fetch 2 0.00 0.00 1 4 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.02 4 12 0 1
For rank()
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.02 3 3 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 9 19 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.03 12 22 0 1
Can I conclude that sub query not will always less efficient than rank? When is indicated rank instead of sub query?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不太确定你的问题是什么。是的,根据这两个执行计划,在这种情况下,子查询方法的预期成本更低。这似乎并不太令人惊讶,因为它可以使用索引非常快速地定位您感兴趣的确切行。特别是在这种情况下,子查询只需非常快速地扫描 PK 索引。如果子查询涉及不属于索引的列,情况可能会有所不同。
使用
rank()
的查询必须获取所有匹配的行并对它们进行排名。我不相信优化器有任何短路逻辑来识别这是一个 top-n 查询,因此避免完全排序,即使您真正关心的是排名最高的行。您也可以尝试这种形式,优化器应将其识别为 top-n 查询。我预计在您的情况下,只需要对索引进行一次范围扫描,然后进行表访问。
I'm not really sure what your question is. Yes, according to these two execution plans, in this case, the subquery method has a lower expected cost. Doesn't seem too surprising, since it can use the index to very quickly locate the exact row you're interested in. Specifically in this case, the subquery only has to do a very quick scan of the PK index. The situation might be different if the subquery involved columns that weren't part of the index.
The query using
rank()
has to get all the matching rows and rank them. I don't believe that the optimizer has any short-circuit logic to recognize that this is a top-n query and therefore avoid a full sort, even though all you really care about is the top-ranked row.You might also try this form, which the optimizer should recognize as a top-n query. I would expect in your case that it would require only a single range scan on the index followed by a table access.
成本是基于成本的优化器对执行查询所需费用的估计。
国会预算办公室有可能犯错,尤其是在统计数据已经过时的情况下。
那么,该怎么办呢?尝试使用“set autotrace on”执行每个查询。每个查询执行多少次缓冲区获取和物理读取?换句话说,每个查询实际完成了多少工作?
希望有帮助。
Cost is the cost based optimizer's estimate of what it will take to execute a query.
It's possible for the CBO to get it wrong, especially if statistics are out of date.
So, what to do? Try executing each query with 'set autotrace on'. How many buffer gets and physical reads does each query do? In other words, how much actual work does each query do?
Hope that helps.