为什么这个查询要进行全表扫描?

发布于 2024-07-14 09:59:39 字数 1083 浏览 5 评论 0原文

查询:

SELECT tbl1.*
   FROM tbl1 
JOIN tbl2
     ON (tbl1.t1_pk  = tbl2.t2_fk_t1_pk
AND tbl2.t2_strt_dt <= sysdate
AND tbl2.t2_end_dt  >= sysdate)
JOIN tbl3 on (tbl3.t3_pk = tbl2.t2_fk_t3_pk
AND tbl3.t3_lkup_1 = 2577304
AND tbl3.t3_lkup_2 = 1220833)
where tbl2.t2_lkup_1   = 1020000002981587;

事实:

  • Oracle XE
  • tbl1.t1_pk 是主键。
  • tbl2.t2_fk_t1_pk 是该 t1_pk 列上的外键。
  • tbl2.t2_lkup_1 已建立索引。
  • tbl3.t3_pk 是主键。
  • tbl2.t2_fk_t3_pk 是该 t3_pk 列上的外键。

解释对 tbl1 中有 11,000 行、tbl1 中有 3500 行的数据库的计划 tbl2 显示它正在对 tbl1 进行全表扫描。 在我看来 如果它可以在 tbl1 上执行索引查询,它应该会更快。

解释对 tbl1 中有 11,000 行、tbl1 中有 3500 行的数据库的计划 tbl2 显示它正在对 tbl1 进行全表扫描。 在我看来 如果它可以在 tbl1 上执行索引查询,它应该会更快。

更新:我尝试了你们中的一些人建议的提示,并且解释成本变得更糟! 现在我真的很困惑。

进一步更新:我终于访问了生产数据库的副本, “解释计划”显示它使用索引并且成本低得多 询问。 我猜有更多数据(tbl1 中超过 100,000 行,50,000 行 在表2中)是它决定索引值得的原因。 感谢所有提供帮助的人。 我仍然认为 Oracle 性能调优是一门黑术,但我很高兴你们中的一些人能够理解它。

进一步更新:我已应前雇主的要求更新了问题。 他们不喜欢他们的表名出现在谷歌查询中。 我应该更清楚。

The query:

SELECT tbl1.*
   FROM tbl1 
JOIN tbl2
     ON (tbl1.t1_pk  = tbl2.t2_fk_t1_pk
AND tbl2.t2_strt_dt <= sysdate
AND tbl2.t2_end_dt  >= sysdate)
JOIN tbl3 on (tbl3.t3_pk = tbl2.t2_fk_t3_pk
AND tbl3.t3_lkup_1 = 2577304
AND tbl3.t3_lkup_2 = 1220833)
where tbl2.t2_lkup_1   = 1020000002981587;

Facts:

  • Oracle XE
  • tbl1.t1_pk is a primary key.
  • tbl2.t2_fk_t1_pk is a foreign key on that t1_pk column.
  • tbl2.t2_lkup_1 is indexed.
  • tbl3.t3_pk is a primary key.
  • tbl2.t2_fk_t3_pk is a foreign key on that t3_pk column.

Explain plan on a database with 11,000 rows in tbl1 and 3500 rows in
tbl2 shows that it's doing a full table scan on tbl1. Seems to me that
it should be faster if it could do a index query on tbl1.

Explain plan on a database with 11,000 rows in tbl1 and 3500 rows in
tbl2 shows that it's doing a full table scan on tbl1. Seems to me that
it should be faster if it could do a index query on tbl1.

Update: I tried the hint a few of you suggested, and the explain cost got much worse! Now I'm really confused.

Further Update: I finally got access to a copy of the production database,
and "explain plan" showed it using indexes and with a much lower cost
query. I guess having more data (over 100,000 rows in tbl1 and 50,000 rows
in tbl2) were what it took to make it decide that indexes were worth it. Thanks to everybody who helped. I still think Oracle performance tuning is a black art, but I'm glad some of you understand it.

Further update: I've updated the question at the request of my former employer. They don't like their table names showing up in google queries. I should have known better.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

养猫人 2024-07-21 09:59:39

简单的答案:因为优化器期望找到的行数比实际找到的行数多。

检查统计数据,它们是最新的吗?
检查解释计划中的预期基数是否与实际结果相符? 如果没有修复与该步骤相关的统计信息。

连接列的直方图可能会有所帮助。 Oracle 将使用它们来估计连接产生的基数。

当然,您始终可以通过提示强制使用索引

The easy answer: Because the optimizer expects more rows to find then it actually does find.

Check the statistics, are they up to date?
Check the expected cardinality in the explain plan do they match the actual results? If not fix the statistics relevant for that step.

Histogramms for the joined columns might help. Oracle will use those to estimate the cardinality resulting from a join.

Of course you can always force index usage with a hint

倾城泪 2024-07-21 09:59:39

查看优化器的行计数估计值会很有用,这些估计值不在您发布的 SQL Developer 输出中。

我注意到它正在执行的两个索引查找是范围扫描而不是唯一扫描。 因此,它对返回的行数的估计很容易偏离很远(无论统计信息是否是最新的)。

我的猜测是,它对 TBL2 的表访问的最终行计数的估计相当高,因此它认为它将在 TBL1 中找到大量匹​​配项,因此决定进行完整扫描/哈希连接而不是嵌套循环/索引扫描。

为了获得真正的乐趣,您可以在启用事件 10053 的情况下运行查询,并获取显示优化器执行的计算的跟踪。

It would be useful to see the optimizer's row count estimates, which are not in the SQL Developer output you posted.

I note that the two index lookups it is doing are RANGE SCAN not UNIQUE SCAN. So its estimates of how many rows are being returned could easily be far off (whether statistics are up to date or not).

My guess is that its estimate of the final row count from the TABLE ACCESS of TBL2 is fairly high, so it thinks that it will find a large number of matches in TBL1 and therefore decides on doing a full scan/hash join rather than a nested loop/index scan.

For some real fun, you could run the query with event 10053 enabled and get a trace showing the calculations performed by the optimizer.

李不 2024-07-21 09:59:39

Oracle 尝试返回所需 I/O 量最少的结果集(通常,这是有意义的,因为 I/O 很慢)。 索引至少需要 2 个 I/O 调用。 一份到索引,一份到表。 通常更多,具体取决于索引的大小、表的大小以及返回的记录数、它们在数据文件中的位置……

这就是统计信息的用武之地。假设您的查询估计返回 10 条记录。 优化器可能会计算出使用索引将需要 10 次 I/O 调用。 假设根据统计数据,您的表位于数据文件中的 6 个块中。 Oracle 执行全扫描(6 个 I/O)然后读取索引、读取表、读取下一个匹配键的索引、读取表等等会更快。

所以在你的情况下,桌子可能真的很小。 统计数据可能有误。

我使用以下内容来收集统计数据并根据我的具体需求进行定制:

begin

 DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '&owner' ,tabname => '&table_name', estimate_percent => dbms_stats.AUTO_SAMPLE_SIZE,granularity
=> 'ALL', cascade  => TRUE); 

 -- DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '&owner' ,tabname => '&table_name',partname => '&partion_name',granularity => 'PARTITION', estimate_percent => dbms_stats.AUTO_SAMPLE_SIZE, cascade 
=> TRUE);

 -- DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '&owner' ,tabname => '&table_name',partname => '&partion_name',granularity => 'PARTITION', estimate_percent => dbms_stats.AUTO_SAMPLE_SIZE, cascade 
=> TRUE,method_opt  => 'for all indexed columns size 254');

end;

Oracle tries to return the result set with the least amount of I/O required (typically, which makes sense because I/o is slow). Indexes take at least 2 I/O calls. one to the index and one to the table. Usually more, depending on the size of the index and tables sizes and the number of records returns, where they are in the datafile, ...

This is where statistics come in. Lets say your query is estimated to return 10 records. The optimizer may calculate that using an index will take 10 I/O calls. Let's say your table, according to the statistics on it, resides in 6 blocks in the data file. It will be faster for Oracle to do a full scan ( 6 I/O) then read the index, read the table, read then index for the next matching key, read the table and so on.

So in your case, the table may be real small. The statistics may be off.

I use the following to gather statistics and customize it for my exact needs:

begin

 DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '&owner' ,tabname => '&table_name', estimate_percent => dbms_stats.AUTO_SAMPLE_SIZE,granularity
=> 'ALL', cascade  => TRUE); 

 -- DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '&owner' ,tabname => '&table_name',partname => '&partion_name',granularity => 'PARTITION', estimate_percent => dbms_stats.AUTO_SAMPLE_SIZE, cascade 
=> TRUE);

 -- DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '&owner' ,tabname => '&table_name',partname => '&partion_name',granularity => 'PARTITION', estimate_percent => dbms_stats.AUTO_SAMPLE_SIZE, cascade 
=> TRUE,method_opt  => 'for all indexed columns size 254');

end;
梦里南柯 2024-07-21 09:59:39

您只能通过查看 SQL 优化器/执行器创建的查询计划来判断。 它将至少部分基于索引统计信息,而索引统计信息无法仅根据定义进行预测(因此可能会随着时间的推移而变化)。

适用于 SQL Server 2005/2008 的 SQL Management studio,适用于早期版本的查询分析器。

(记不起 Oracle 的正确工具名称。)

You can only tell by looking at the query plan the SQL optimizer/executor creates. It will be at least partial based on index statistics which cannot be predicted from just the definition (and can, therefore, change over time).

SQL Management studio for SQL Server 2005/2008, Query Analyzer for earlier versions.

(Can't recall the right tool names for Oracle.)

国产ˉ祖宗 2024-07-21 09:59:39

尝试添加索引提示。

SELECT /*+ index(tbl1 tbl1_index_name) */ .....

有时 Oracle 就是不知道要使用哪个索引。

Try adding an index hint.

SELECT /*+ index(tbl1 tbl1_index_name) */ .....

Sometimes Oracle just doesn't know which index to use.

奈何桥上唱咆哮 2024-07-21 09:59:39

显然这个查询给出了相同的计划:

SELECT tbl1.*   
FROM tbl1 
JOIN tbl2 ON (tbl1.t1_pk  = tbl2.t2_fk_t1_pk)
JOIN tbl3 on (tbl3.t3_pk = tbl2.t2_fk_t3_pk)
where tbl2.t2_lkup_1   = 1020000002981587
AND tbl2.t2_strt_dt <= sysdate
AND tbl2.t2_end_dt  >= sysdate
AND tbl3.t3_lkup_1 = 2577304
AND tbl3.t3_lkup_2 = 1220833;

如果您将此查询重写为会发生什么:

SELECT tbl1.*    
FROM  tbl1 
,     tbl2
,     tbl3  
where tbl2.t2_lkup_1   = 1020000002981587 
AND   tbl1.t1_pk  = tbl2.t2_fk_t1_pk 
AND   tbl3.t3_pk = tbl2.t2_fk_t3_pk 
AND   tbl2.t2_strt_dt <= sysdate 
AND   tbl2.t2_end_dt  >= sysdate 
AND   tbl3.t3_lkup_1 = 2577304 
AND   tbl3.t3_lkup_2 = 1220833;

Apparently this query gives the same plan:

SELECT tbl1.*   
FROM tbl1 
JOIN tbl2 ON (tbl1.t1_pk  = tbl2.t2_fk_t1_pk)
JOIN tbl3 on (tbl3.t3_pk = tbl2.t2_fk_t3_pk)
where tbl2.t2_lkup_1   = 1020000002981587
AND tbl2.t2_strt_dt <= sysdate
AND tbl2.t2_end_dt  >= sysdate
AND tbl3.t3_lkup_1 = 2577304
AND tbl3.t3_lkup_2 = 1220833;

What happens if you rewrite this query to:

SELECT tbl1.*    
FROM  tbl1 
,     tbl2
,     tbl3  
where tbl2.t2_lkup_1   = 1020000002981587 
AND   tbl1.t1_pk  = tbl2.t2_fk_t1_pk 
AND   tbl3.t3_pk = tbl2.t2_fk_t3_pk 
AND   tbl2.t2_strt_dt <= sysdate 
AND   tbl2.t2_end_dt  >= sysdate 
AND   tbl3.t3_lkup_1 = 2577304 
AND   tbl3.t3_lkup_2 = 1220833;
婴鹅 2024-07-21 09:59:39

根据您预期的结果大小,您可以使用一些会话参数进行操作:

SHOW PARAMETER optimizer_index_cost_adj;
[...]
ALTER SESSION SET optimizer_index_cost_adj = 10;

SHOW PARAMETER OPTIMIZER_MODE;
[...]
ALTER SESSION SET OPTIMIZER_MODE=FIRST_ROWS_100;

并且不要忘记检查真实的执行时间,有时计划不是现实世界;)

Depends on your expected result size you can play arround with some session parameters:

SHOW PARAMETER optimizer_index_cost_adj;
[...]
ALTER SESSION SET optimizer_index_cost_adj = 10;

SHOW PARAMETER OPTIMIZER_MODE;
[...]
ALTER SESSION SET OPTIMIZER_MODE=FIRST_ROWS_100;

and dont forget to check the real executiontime, sometimes the plan is not the real world ;)

久夏青 2024-07-21 09:59:39

看起来 tbl1 表的索引没有被选取。 确保
您有 t2_lkup_1 列的索引,它不应该是多列,否则索引不适用。

(除了马特的评论之外)
根据您的查询,我相信您加入是因为您想过滤掉
不进行 JOIN 的记录可能会增加结果集的基数
tbl1 表,如果 中存在重复匹配项。 请参阅Jeff Atwood 评论

尝试一下,它使用了exist函数并加入(在 Oracle 上速度非常快)

select *
  from tbl1 
 where tbl2.t2_lkup_1 = 1020000002981587 and
       exists (
         select *
           from tbl2, tbl3 
          where tbl2.t2_fk_t1_pk = tbl1.t1_pk and
                tbl2.t2_fk_t3_pk = tbl3.t3_pk  and
                sysdate between tbl2.t2_strt_dt and tbl2.t2_end_dt and
                tbl3.t3_lkup_1 = 2577304 and
                tbl3.t3_lkup_2 = 1220833);

It looks like an index for tbl1 table is not being picked up. Make sure
you have an index for t2_lkup_1 column and it should not be multi-column otherwise the index is not applicable.

(in addition to what Matt's comment)
From your query I believe you're joining because you want to filter out
records not to do JOIN which may increase cardinality for result set from
tbl1 table if there are duplicate matches from . See Jeff Atwood comment

Try this, which uses exist function and join (which is really fast on oracle)

select *
  from tbl1 
 where tbl2.t2_lkup_1 = 1020000002981587 and
       exists (
         select *
           from tbl2, tbl3 
          where tbl2.t2_fk_t1_pk = tbl1.t1_pk and
                tbl2.t2_fk_t3_pk = tbl3.t3_pk  and
                sysdate between tbl2.t2_strt_dt and tbl2.t2_end_dt and
                tbl3.t3_lkup_1 = 2577304 and
                tbl3.t3_lkup_2 = 1220833);

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文