性能：rank() 与子查询。子查询成本更低？

发布于 2025-01-04 14:49:43 字数 3000 浏览 4 评论 0原文

受到这个问题的启发，我决定测试 rank( ) 函数，尝试查看子查询的效率是否低于排名。所以我创建了一个表：

create table teste_rank ( codigo number(7), data_mov date, valor number(14,2) );
alter table teste_rank add constraint tst_rnk_pk primary key ( codigo, data_mov );

并插入了一些记录...

declare
  vdata date;
begin
  dbms_random.initialize(120401);
  vdata := to_date('04011997','DDMMYYYY');
  for reg in 1 .. 465 loop
    vdata := to_date('04011997','DDMMYYYY');
    while vdata <= trunc(sysdate) loop
      insert into teste_rank 
          (codigo, data_mov, valor) 
        values 
          (reg, vdata, dbms_random.value(1,150000));
      vdata := vdata + 2;
    end loop;
    commit;
  end loop;
end;
/

然后测试了两个查询：

select * 
  from teste_rank r
 where r.data_mov = ( select max(data_mov) 
                        from teste_rank 
                       where data_mov <= trunc(sysdate) 
                         and codigo = 1 )
   and r.codigo = 1;

Explain Plan for sub query

select *
  from ( select rank() over ( partition by codigo order by data_mov desc ) rn, t.*
           from teste_rank t
          where codigo = 1
            and data_mov <= trunc(sysdate) ) r
 where r.rn = 1;

Explain planrank()

正如你所看到的，子查询的成本低于rank()。这是对的吗？我在那里错过了什么吗？

PS：还使用表中的完整查询和低成本的子查询进行了测试。

编辑

我生成了两个查询的 tkprof（跟踪一个，关闭数据库，启动并跟踪第二个）。

For subquery

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.02          3          5          0           0
Execute      1      0.00       0.00          0          3          0           0
Fetch        2      0.00       0.00          1          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.02          4         12          0           1

For rank()

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.02          3          3          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          9         19          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.03         12         22          0           1

我可以得出结论，子查询并不总是比Rank 效率低吗？什么时候指示排名而不是子查询？

原文

Inspired by this question I decided to test the rank() function, trying to see if sub query's are less efficient than rank. So I created a table:

create table teste_rank ( codigo number(7), data_mov date, valor number(14,2) );
alter table teste_rank add constraint tst_rnk_pk primary key ( codigo, data_mov );

and inserted some records...

declare
  vdata date;
begin
  dbms_random.initialize(120401);
  vdata := to_date('04011997','DDMMYYYY');
  for reg in 1 .. 465 loop
    vdata := to_date('04011997','DDMMYYYY');
    while vdata <= trunc(sysdate) loop
      insert into teste_rank 
          (codigo, data_mov, valor) 
        values 
          (reg, vdata, dbms_random.value(1,150000));
      vdata := vdata + 2;
    end loop;
    commit;
  end loop;
end;
/

And then tested two querys:

select * 
  from teste_rank r
 where r.data_mov = ( select max(data_mov) 
                        from teste_rank 
                       where data_mov <= trunc(sysdate) 
                         and codigo = 1 )
   and r.codigo = 1;

Explain Plan for sub query

select *
  from ( select rank() over ( partition by codigo order by data_mov desc ) rn, t.*
           from teste_rank t
          where codigo = 1
            and data_mov <= trunc(sysdate) ) r
 where r.rn = 1;

Explain plan rank()

As you can see, the cost of sub query is lower than rank(). Is this right? Am I missing something there?

PS: Tested also with a full query in the table and still sub query with the low cost.

EDIT

I generated a tkprof of the two query's (traced one, shutdown the database, startup and traced the second).

For subquery

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.02          3          5          0           0
Execute      1      0.00       0.00          0          3          0           0
Fetch        2      0.00       0.00          1          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.02          4         12          0           1

For rank()

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.02          3          3          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          9         19          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.03         12         22          0           1

Can I conclude that sub query not will always less efficient than rank? When is indicated rank instead of sub query?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一片旧的回忆 2025-01-11 14:49:43

我不太确定你的问题是什么。是的，根据这两个执行计划，在这种情况下，子查询方法的预期成本更低。这似乎并不太令人惊讶，因为它可以使用索引非常快速地定位您感兴趣的确切行。特别是在这种情况下，子查询只需非常快速地扫描 PK 索引。如果子查询涉及不属于索引的列，情况可能会有所不同。

使用 rank() 的查询必须获取所有匹配的行并对它们进行排名。我不相信优化器有任何短路逻辑来识别这是一个 top-n 查询，因此避免完全排序，即使您真正关心的是排名最高的行。

您也可以尝试这种形式，优化器应将其识别为 top-n 查询。我预计在您的情况下，只需要对索引进行一次范围扫描，然后进行表访问。

select * 
  from (select *
          from teste_rank r
          where data_mov <= trunc(sysdate) 
            and codigo = 1
        order by data_mov desc)
  where rownum=1;

I'm not really sure what your question is. Yes, according to these two execution plans, in this case, the subquery method has a lower expected cost. Doesn't seem too surprising, since it can use the index to very quickly locate the exact row you're interested in. Specifically in this case, the subquery only has to do a very quick scan of the PK index. The situation might be different if the subquery involved columns that weren't part of the index.

The query using rank() has to get all the matching rows and rank them. I don't believe that the optimizer has any short-circuit logic to recognize that this is a top-n query and therefore avoid a full sort, even though all you really care about is the top-ranked row.

You might also try this form, which the optimizer should recognize as a top-n query. I would expect in your case that it would require only a single range scan on the index followed by a table access.

select * 
  from (select *
          from teste_rank r
          where data_mov <= trunc(sysdate) 
            and codigo = 1
        order by data_mov desc)
  where rownum=1;

回复收藏 0 原文