Oracle 8i 日期函数速度慢

发布于 2024-10-17 14:42:11 字数 2615 浏览 4 评论 0原文

我正在尝试在 Oracle 8i 服务器（旧的，我知道）上运行以下 PL/SQL：

select
    -- stuff --
from
    s_doc_quote d,
    s_quote_item i,
    s_contact c,
    s_addr_per a,
    cx_meter_info m
where
    d.row_id = i.sd_id
    and d.con_per_id = c.row_id
    and i.ship_per_addr_id = a.row_id(+)
    and i.x_meter_info_id = m.row_id(+)
    and d.x_move_type in ('Move In','Move Out','Move Out / Move In')
    and i.prod_id in ('1-QH6','1-QH8')
    and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate
;

但是执行速度非常慢。由于服务器每天晚上午夜左右关闭，因此经常无法及时完成。

执行计划如下：

SELECT STATEMENT   1179377
 NESTED LOOPS   1179377
  NESTED LOOPS OUTER  959695
   NESTED LOOPS OUTER  740014
    NESTED LOOPS   520332
     INLIST ITERATOR
      TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 157132
       INDEX RANGE SCAN S_QUOTE_ITEM_IDX8 8917
     TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
      INDEX UNIQUE SCAN S_DOC_QUOTE_P1 1
    TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
     INDEX UNIQUE SCAN S_ADDR_PER_P1 1
   TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
    INDEX UNIQUE SCAN CX_METER_INFO_P1 1
  TABLE ACCESS BY INDEX ROWID S_CONTACT 1
   INDEX UNIQUE SCAN S_CONTACT_P1 1

如果我将以下 where 子句更改

and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate

为静态值，例如：

and d.created between to_date('20110101','yyyymmdd') and sysdate

执行计划将变为：

SELECT STATEMENT   5
 NESTED LOOPS   5
  NESTED LOOPS OUTER  4
   NESTED LOOPS OUTER  3
    NESTED LOOPS   2
     TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
      INDEX RANGE SCAN S_DOC_QUOTE_IDX1 3
     INLIST ITERATOR
      TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 1
       INDEX RANGE SCAN S_QUOTE_ITEM_IDX4 4
    TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
     INDEX UNIQUE SCAN S_ADDR_PER_P1 1
   TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
    INDEX UNIQUE SCAN CX_METER_INFO_P1 1
  TABLE ACCESS BY INDEX ROWID S_CONTACT 1
   INDEX UNIQUE SCAN S_CONTACT_P1 1

几乎立即开始返回行。

到目前为止，我已经尝试用绑定变量替换动态日期条件，以及使用从双表中选择动态日期的子查询。到目前为止，这些方法都没有帮助提高性能。

因为我对 PL/SQL 比较陌生，所以我无法理解执行计划中存在如此重大差异的原因。

我还尝试将查询作为 SAS 的直通运行，但为了测试执行速度，我一直在使用 SQL*Plus。

编辑：

为了澄清起见，我已经尝试使用绑定变量，如下所示：

var start_date varchar2(8);
exec :start_date := to_char(add_months(trunc(sysdate,'MM'), -1),'yyyymmdd')

使用以下 where 子句：

and d.created between to_date(:start_date,'yyyymmdd') and sysdate

返回执行成本 1179377。

如果可能，我还想避免使用绑定变量，因为我不相信我可以引用它们来自 SAS 传递查询（尽管我可能是错的）。

原文

I'm trying to run the following PL/SQL on an Oracle 8i server (old, I know):

select
    -- stuff --
from
    s_doc_quote d,
    s_quote_item i,
    s_contact c,
    s_addr_per a,
    cx_meter_info m
where
    d.row_id = i.sd_id
    and d.con_per_id = c.row_id
    and i.ship_per_addr_id = a.row_id(+)
    and i.x_meter_info_id = m.row_id(+)
    and d.x_move_type in ('Move In','Move Out','Move Out / Move In')
    and i.prod_id in ('1-QH6','1-QH8')
    and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate
;

Execution is incredibly slow however. Because the server is taken down around midnight each night, it often fails to complete in time.

The execution plan is as follows:

SELECT STATEMENT   1179377
 NESTED LOOPS   1179377
  NESTED LOOPS OUTER  959695
   NESTED LOOPS OUTER  740014
    NESTED LOOPS   520332
     INLIST ITERATOR
      TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 157132
       INDEX RANGE SCAN S_QUOTE_ITEM_IDX8 8917
     TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
      INDEX UNIQUE SCAN S_DOC_QUOTE_P1 1
    TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
     INDEX UNIQUE SCAN S_ADDR_PER_P1 1
   TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
    INDEX UNIQUE SCAN CX_METER_INFO_P1 1
  TABLE ACCESS BY INDEX ROWID S_CONTACT 1
   INDEX UNIQUE SCAN S_CONTACT_P1 1

If I change the following where clause however:

and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate

To a static value, such as:

and d.created between to_date('20110101','yyyymmdd') and sysdate

the execution plan becomes:

SELECT STATEMENT   5
 NESTED LOOPS   5
  NESTED LOOPS OUTER  4
   NESTED LOOPS OUTER  3
    NESTED LOOPS   2
     TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
      INDEX RANGE SCAN S_DOC_QUOTE_IDX1 3
     INLIST ITERATOR
      TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 1
       INDEX RANGE SCAN S_QUOTE_ITEM_IDX4 4
    TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
     INDEX UNIQUE SCAN S_ADDR_PER_P1 1
   TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
    INDEX UNIQUE SCAN CX_METER_INFO_P1 1
  TABLE ACCESS BY INDEX ROWID S_CONTACT 1
   INDEX UNIQUE SCAN S_CONTACT_P1 1

which begins to return rows almost instantly.

So far, I've tried replacing the dynamic date condition with bind variables, as well as using a subquery which selects a dynamic date from the dual table. Neither of these methods have helped improve performance so far.

Because I'm relatively new to PL/SQL, I'm unable to understand the reasons for such substantial differences in the execution plans.

I'm also trying to run the query as a pass-through from SAS, but for the purposes of testing the execution speed I've been using SQL*Plus.

EDIT:

For clarification, I've already tried using bind variables as follows:

var start_date varchar2(8);
exec :start_date := to_char(add_months(trunc(sysdate,'MM'), -1),'yyyymmdd')

With the following where clause:

and d.created between to_date(:start_date,'yyyymmdd') and sysdate

which returns an execution cost of 1179377.

I would also like to avoid bind variables if possible as I don't believe I can reference them from a SAS pass-through query (although I may be wrong).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

假扮的天使 2024-10-24 14:42:11

我怀疑这里的问题与ADD_MONTHS函数的执行时间有很大关系。您已经表明，当您使用硬编码的最小日期时，执行计划存在显着差异。执行计划的巨大变化通常对运行时的影响比函数调用开销可能大得多，尽管潜在的不同执行计划可能意味着函数被调用更多次。无论哪种方式，要查看的根本问题是为什么您没有得到您想要的执行计划。

好的执行计划从 S_DOC_QUOTE_IDX1 上的范围扫描开始。鉴于查询更改的性质，我假设这是 CREATED 列上的索引。当过滤条件基于 SYSDATE 时，优化器通常不会选择在日期列上使用索引。由于直到执行时才对其进行评估，因此在确定执行计划后，解析器无法很好地估计日期过滤条件的选择性。当您使用硬编码的开始日期时，解析器可以使用该信息来确定选择性，并对索引的使用做出更好的选择。

我也建议绑定变量，但我认为因为你使用的是 8i，优化器无法查看绑定值，所以这让它像以前一样处于黑暗之中。在更高版本的 Oracle 中，我希望绑定解决方案会有效。

但是，这是一个很好的情况，其中使用文字替换可能比使用绑定变量更合适，因为 (a) 开始日期值不是用户指定的，并且 (b) 它将在整个月内保持不变，因此您不会解析大量略有不同的查询。

所以我的建议是编写一些代码来确定开始日期的静态值，并在解析 & 之前将其直接连接到查询字符串中。执行。

回复收藏 0 原文

成熟稳重的好男人 2024-10-24 14:42:11

首先，您获得不同执行时间的原因并不是因为 Oracle 频繁执行日期函数。即使对每一行都执行此 SQL 函数（顺便说一下，它可能不是），与实际从磁盘/内存检索行所需的时间相比，该 SQL 函数的执行只花费了可以忽略不计的时间。

您将获得完全不同的执行时间，因为正如您所注意到的，Oracle 选择了不同的访问路径。选择一种访问路径而不是另一种可能会导致执行时间出现几个数量级的差异。因此，真正的问题不是“为什么 add_months 需要时间？”但是：

为什么 Oracle 选择这种特定的低效路径，而有一条更有效的路径？

要回答这个问题，必须了解优化器的工作原理。优化器选择特定的访问路径通过估计多个访问路径的成本（如果只有几个表，则所有访问路径）并选择预计最有效的执行计划。确定执行计划成本的算法有规则，并且它根据从数据收集的统计数据进行估计。

与所有估计算法一样，它会对数据做出假设，例如基于列的最小/最大值的一般分布、基数以及段中值的物理分布（聚类因子）。

这如何应用于您的特定查询

在您的情况下，优化器必须对不同过滤器子句的选择性进行估计。在第一个查询中，过滤器位于两个变量（add_months(trunc(sysdate,'MM'), -1) 和 sysdate）之间，而在另一种情况下，过滤器位于常量和变量之间。

它们对您来说看起来是一样的，因为您已经用变量的值替换了变量，但对优化器来说，情况非常不同：优化器（至少在 8i 中）只为特定查询计算一次执行计划。一旦访问路径确定，所有进一步的执行都将得到相同的执行计划。因此，它不能用变量的值替换变量，因为该值将来可能会发生变化，并且访问计划必须适用于所有可能的值。

由于第二个查询使用变量，优化器无法精确确定第一个查询的选择性，因此优化器会进行猜测，这会导致您的情况出现错误的计划。

当优化器没有选择正确的计划时该怎么办

如上所述，优化器有时会做出错误的猜测，从而导致访问路径不理想。即使这种情况很少发生，这也可能是灾难性的（几个小时而不是几秒钟）。您可以尝试以下一些操作：

确保您的统计数据是最新的。 ALL_TABLES 和 ALL_INDEXES 上的 last_analyzed 列将告诉您上次收集这些对象的统计信息的时间。良好可靠的统计数据可以带来更准确的猜测，从而（希望）带来更好的执行计划。
了解收集统计信息的不同选项（dbms_stats 包）
重写查询以在有意义时使用常量，以便优化器做出更可靠的猜测。
有时两个逻辑上相同的查询将导致不同的执行计划，因为优化器不会计算相同的访问路径（所有可能的路径）。
您可以使用一些技巧来强制优化器在其他连接之前执行某些连接，例如：
- 使用 rownum 具体化子查询（它可能需要更多临时空间，但允许您强制优化器执行特定步骤）。
- 使用提示，不过大多数时候，当其他方法都失败时，我只会求助于提示。特别是，我有时会使用 LEADING 提示强制优化器从特定表（或几个表）开始。
最后，您可能会发现最近的版本具有通常更可靠的优化器。 8i 已经有 12 多年的历史了，可能是时候升级了:)

这确实是一个有趣的话题。 Oracle 优化器不断变化（在版本之间），它会随着时间的推移而改进，即使有时会在缺陷得到纠正时引入新的怪癖。如果您想了解更多信息，我建议 Jonathan Lewis 的 Cost Based Oracle ：基础知识

First of all, the reason you are getting different execution time is not because Oracle executes the date function a lot. The execution of this SQL function, even if it is done for each and every row (it probably is not by the way), only takes a negligible amount of time compared to the time it takes to actually retrieve the rows from disk/memory.

You are getting completely different execution times because, as you have noticed, Oracle chooses a different access path. Choosing one access path over another can lead to orders of magnitudes of difference in execution time. The real question therefore, is not "why does add_months takes time?" but:

Why does Oracle choose this particular unefficient path while there is a more efficient one?

To answer this question, one must understand how the optimizer works. The optimizer chooses a particular access path by estimating the cost of several access paths (all of them if there are only a few tables) and choosing the execution plan that is expected to be the most efficient. The algorithm to determine the cost of an execution plan has rules and it makes its estimation based on statistics gathered from your data.

As all estimation algorithms, it makes assumptions about your data, such as the general distribution based on min/max value of columns, cardinality, and the physical distribution of the values in the segment (clustering factor).

How this applies to your particular query

In your case, the optimizer has to make an estimation about the selectivity of the different filter clauses. In the first query the filter is between two variables (add_months(trunc(sysdate,'MM'), -1) and sysdate) while in the other case the filter is between a constant and a variable.

They look the same to you because you have substituted the variable by its value, but to the optimizer the cases are very different: the optimizer (at least in 8i) only computes an execution plan once for a particular query. Once the access path has been determined, all further execution will get the same execution plan. It can not, therefore, replace a variable by its value because the value may change in the future, and the access plan must work for all possible values.

Since the second query uses variables, the optimizer cannot determine precisely the selectivity of the first query, so the optimizer makes a guess, and that results in your case in a bad plan.

What can you do when the optimizer doesn't choose the correct plan

As mentionned above, the optimizer sometimes makes bad guesses, which result in suboptimal access path. Even if it happens rarely, this can be disastrous (hours instead of seconds). Here are some actions you could try:

Make sure your stats are up-to-date. The last_analyzed column on ALL_TABLES and ALL_INDEXES will tell you when was the last time the stats were collected on these objects. Good reliable stats lead to more accurate guesses, leading (hopefully) to better execution plan.
Learn about the different options to collect statistics (dbms_stats package)
Rewrite your query to make use of constants, when it makes sense, so that the optimizer will make more reliable guesses.
Sometimes two logically identical queries will result in different execution plans, because the optimizer will not compute the same access paths (of all possible paths).
There are some tricks you can use to force the optimizer to perform some join before others, for example:
- Use rownum to materialize a subquery (it may take more temporary space, but will allow you to force the optimizer through a specific step).
- Use hints, although most of the time I would only turn to hints when all else fails. In particular, I sometimes use the LEADING hint to force the optimizer to start with a specific table (or couple of table).
Last of all, you will probably find that the more recent releases have a generally more reliable optimizer. 8i is 12+ years old and it may be time for an upgrade :)

This is really an interesting topic. The oracle optimizer is ever-changing (between releases) it improves over time, even if new quirks are sometimes introduced as defects get corrected. If you want to learn more, I would suggest Jonathan Lewis' Cost Based Oracle: Fundamentals

回复收藏 0 原文

貪欢 2024-10-24 14:42:11

这是因为该函数会在每次比较时运行。

有时将其放入对偶选择中会更快：

and d.created 
    between (select add_months(trunc(sysdate,'MM'), -1) from dual) 
    and sysdate

否则，您也可以像这样加入日期：

select
    -- stuff --
from
    s_doc_quote d,
    s_quote_item i,
    s_contact c,
    s_addr_per a,
    cx_meter_info m,
    (select add_months(trunc(sysdate,'MM'), -1) as startdate from dual) sd
where
    d.row_id = i.sd_id
    and d.con_per_id = c.row_id
    and i.ship_per_addr_id = a.row_id(+)
    and i.x_meter_info_id = m.row_id(+)
    and d.x_move_type in ('Move In','Move Out','Move Out / Move In')
    and i.prod_id in ('1-QH6','1-QH8')
    and d.created between sd.startdate and sysdate

最后一个选项，实际上是提高性能的最佳机会：向查询添加日期参数，如下所示：

and d.created between :startdate and sysdate

[编辑]
抱歉，我发现您已经尝试过类似的选项。还是很奇怪。如果常量值有效，只要将 add_months 函数保留在查询之外，绑定参数也应该有效。

That's because the function is run for every comparison.

sometimes it's faster to put it in a select from dual:

and d.created 
    between (select add_months(trunc(sysdate,'MM'), -1) from dual) 
    and sysdate

otherwise, you could also join the date like this:

select
    -- stuff --
from
    s_doc_quote d,
    s_quote_item i,
    s_contact c,
    s_addr_per a,
    cx_meter_info m,
    (select add_months(trunc(sysdate,'MM'), -1) as startdate from dual) sd
where
    d.row_id = i.sd_id
    and d.con_per_id = c.row_id
    and i.ship_per_addr_id = a.row_id(+)
    and i.x_meter_info_id = m.row_id(+)
    and d.x_move_type in ('Move In','Move Out','Move Out / Move In')
    and i.prod_id in ('1-QH6','1-QH8')
    and d.created between sd.startdate and sysdate

Last option and actually the best chance of improved performance: Add a date parameter to the query like this:

and d.created between :startdate and sysdate

[edit]
I'm sorry, I see you already tried options like these. Still odd. If the constant value works, the bind parameter should work as well, as long as you keep the add_months function outside the query.

回复收藏 0 原文