Oracle 8i 日期函数速度慢
我正在尝试在 Oracle 8i 服务器(旧的,我知道)上运行以下 PL/SQL:
select
-- stuff --
from
s_doc_quote d,
s_quote_item i,
s_contact c,
s_addr_per a,
cx_meter_info m
where
d.row_id = i.sd_id
and d.con_per_id = c.row_id
and i.ship_per_addr_id = a.row_id(+)
and i.x_meter_info_id = m.row_id(+)
and d.x_move_type in ('Move In','Move Out','Move Out / Move In')
and i.prod_id in ('1-QH6','1-QH8')
and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate
;
但是执行速度非常慢。由于服务器每天晚上午夜左右关闭,因此经常无法及时完成。
执行计划如下:
SELECT STATEMENT 1179377
NESTED LOOPS 1179377
NESTED LOOPS OUTER 959695
NESTED LOOPS OUTER 740014
NESTED LOOPS 520332
INLIST ITERATOR
TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 157132
INDEX RANGE SCAN S_QUOTE_ITEM_IDX8 8917
TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
INDEX UNIQUE SCAN S_DOC_QUOTE_P1 1
TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
INDEX UNIQUE SCAN S_ADDR_PER_P1 1
TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
INDEX UNIQUE SCAN CX_METER_INFO_P1 1
TABLE ACCESS BY INDEX ROWID S_CONTACT 1
INDEX UNIQUE SCAN S_CONTACT_P1 1
如果我将以下 where 子句更改
and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate
为静态值,例如:
and d.created between to_date('20110101','yyyymmdd') and sysdate
执行计划将变为:
SELECT STATEMENT 5
NESTED LOOPS 5
NESTED LOOPS OUTER 4
NESTED LOOPS OUTER 3
NESTED LOOPS 2
TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
INDEX RANGE SCAN S_DOC_QUOTE_IDX1 3
INLIST ITERATOR
TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 1
INDEX RANGE SCAN S_QUOTE_ITEM_IDX4 4
TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
INDEX UNIQUE SCAN S_ADDR_PER_P1 1
TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
INDEX UNIQUE SCAN CX_METER_INFO_P1 1
TABLE ACCESS BY INDEX ROWID S_CONTACT 1
INDEX UNIQUE SCAN S_CONTACT_P1 1
几乎立即开始返回行。
到目前为止,我已经尝试用绑定变量替换动态日期条件,以及使用从双表中选择动态日期的子查询。到目前为止,这些方法都没有帮助提高性能。
因为我对 PL/SQL 比较陌生,所以我无法理解执行计划中存在如此重大差异的原因。
我还尝试将查询作为 SAS 的直通运行,但为了测试执行速度,我一直在使用 SQL*Plus。
编辑:
为了澄清起见,我已经尝试使用绑定变量,如下所示:
var start_date varchar2(8);
exec :start_date := to_char(add_months(trunc(sysdate,'MM'), -1),'yyyymmdd')
使用以下 where 子句:
and d.created between to_date(:start_date,'yyyymmdd') and sysdate
返回执行成本 1179377。
如果可能,我还想避免使用绑定变量,因为我不相信我可以引用它们来自 SAS 传递查询(尽管我可能是错的)。
I'm trying to run the following PL/SQL on an Oracle 8i server (old, I know):
select
-- stuff --
from
s_doc_quote d,
s_quote_item i,
s_contact c,
s_addr_per a,
cx_meter_info m
where
d.row_id = i.sd_id
and d.con_per_id = c.row_id
and i.ship_per_addr_id = a.row_id(+)
and i.x_meter_info_id = m.row_id(+)
and d.x_move_type in ('Move In','Move Out','Move Out / Move In')
and i.prod_id in ('1-QH6','1-QH8')
and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate
;
Execution is incredibly slow however. Because the server is taken down around midnight each night, it often fails to complete in time.
The execution plan is as follows:
SELECT STATEMENT 1179377
NESTED LOOPS 1179377
NESTED LOOPS OUTER 959695
NESTED LOOPS OUTER 740014
NESTED LOOPS 520332
INLIST ITERATOR
TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 157132
INDEX RANGE SCAN S_QUOTE_ITEM_IDX8 8917
TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
INDEX UNIQUE SCAN S_DOC_QUOTE_P1 1
TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
INDEX UNIQUE SCAN S_ADDR_PER_P1 1
TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
INDEX UNIQUE SCAN CX_METER_INFO_P1 1
TABLE ACCESS BY INDEX ROWID S_CONTACT 1
INDEX UNIQUE SCAN S_CONTACT_P1 1
If I change the following where clause however:
and d.created between add_months(trunc(sysdate,'MM'), -1) and sysdate
To a static value, such as:
and d.created between to_date('20110101','yyyymmdd') and sysdate
the execution plan becomes:
SELECT STATEMENT 5
NESTED LOOPS 5
NESTED LOOPS OUTER 4
NESTED LOOPS OUTER 3
NESTED LOOPS 2
TABLE ACCESS BY INDEX ROWID S_DOC_QUOTE 1
INDEX RANGE SCAN S_DOC_QUOTE_IDX1 3
INLIST ITERATOR
TABLE ACCESS BY INDEX ROWID S_QUOTE_ITEM 1
INDEX RANGE SCAN S_QUOTE_ITEM_IDX4 4
TABLE ACCESS BY INDEX ROWID S_ADDR_PER 1
INDEX UNIQUE SCAN S_ADDR_PER_P1 1
TABLE ACCESS BY INDEX ROWID CX_METER_INFO 1
INDEX UNIQUE SCAN CX_METER_INFO_P1 1
TABLE ACCESS BY INDEX ROWID S_CONTACT 1
INDEX UNIQUE SCAN S_CONTACT_P1 1
which begins to return rows almost instantly.
So far, I've tried replacing the dynamic date condition with bind variables, as well as using a subquery which selects a dynamic date from the dual table. Neither of these methods have helped improve performance so far.
Because I'm relatively new to PL/SQL, I'm unable to understand the reasons for such substantial differences in the execution plans.
I'm also trying to run the query as a pass-through from SAS, but for the purposes of testing the execution speed I've been using SQL*Plus.
EDIT:
For clarification, I've already tried using bind variables as follows:
var start_date varchar2(8);
exec :start_date := to_char(add_months(trunc(sysdate,'MM'), -1),'yyyymmdd')
With the following where clause:
and d.created between to_date(:start_date,'yyyymmdd') and sysdate
which returns an execution cost of 1179377.
I would also like to avoid bind variables if possible as I don't believe I can reference them from a SAS pass-through query (although I may be wrong).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我怀疑这里的问题与ADD_MONTHS函数的执行时间有很大关系。您已经表明,当您使用硬编码的最小日期时,执行计划存在显着差异。执行计划的巨大变化通常对运行时的影响比函数调用开销可能大得多,尽管潜在的不同执行计划可能意味着函数被调用更多次。无论哪种方式,要查看的根本问题是为什么您没有得到您想要的执行计划。
好的执行计划从
S_DOC_QUOTE_IDX1
上的范围扫描开始。鉴于查询更改的性质,我假设这是 CREATED 列上的索引。当过滤条件基于 SYSDATE 时,优化器通常不会选择在日期列上使用索引。由于直到执行时才对其进行评估,因此在确定执行计划后,解析器无法很好地估计日期过滤条件的选择性。当您使用硬编码的开始日期时,解析器可以使用该信息来确定选择性,并对索引的使用做出更好的选择。我也建议绑定变量,但我认为因为你使用的是 8i,优化器无法查看绑定值,所以这让它像以前一样处于黑暗之中。在更高版本的 Oracle 中,我希望绑定解决方案会有效。
但是,这是一个很好的情况,其中使用文字替换可能比使用绑定变量更合适,因为 (a) 开始日期值不是用户指定的,并且 (b) 它将在整个月内保持不变,因此您不会解析大量略有不同的查询。
所以我的建议是编写一些代码来确定开始日期的静态值,并在解析 & 之前将其直接连接到查询字符串中。执行。
I doubt that the problem here has much to do with the execution time of the ADD_MONTHS function. You've already shown that there is a significant difference in the execution plan when you use a hardcoded minimum date. Big changes in execution plans generally have much more impact on run time than function call overhead is likely to, although potentially different execution plans can mean that the function is called many more times. Either way the root problem to look at is why you aren't getting the execution plan you want.
The good execution plan starts off with a range scan on
S_DOC_QUOTE_IDX1
. Given the nature of the change to the query, I assume this is an index on theCREATED
column. Often the optimizer will not choose to use an index on a date column when the filter condition is based onSYSDATE
. Because it is not evaluated until execution time, after the execution plan has been determined, the parser cannot make a good estimate of the selectivity of the date filter condition. When you use a hardcoded start date instead, the parser can use that information to determine selectivity, and makes a better choice about the use of the index.I would have suggested bind variables as well, but I think because you are on 8i the optimizer can't peek at bind values, so this leaves it just as much in the dark as before. On a later Oracle version I would expect that the bind solution would be effective.
However, this is a good case where using literal substitution is probably more appropriate than using a bind variable, since (a) the start date value is not user-specified, and (b) it will remain constant for the whole month, so you won't be parsing lots of slightly different queries.
So my suggestion is to write some code to determine a static value for the start date and concatenate it directly into the query string before parsing & execution.
首先,您获得不同执行时间的原因并不是因为 Oracle 频繁执行日期函数。即使对每一行都执行此 SQL 函数(顺便说一下,它可能不是),与实际从磁盘/内存检索行所需的时间相比,该 SQL 函数的执行只花费了可以忽略不计的时间。
您将获得完全不同的执行时间,因为正如您所注意到的,Oracle 选择了不同的访问路径。选择一种访问路径而不是另一种可能会导致执行时间出现几个数量级的差异。因此,真正的问题不是“为什么
add_months
需要时间?”但是:为什么 Oracle 选择这种特定的低效路径,而有一条更有效的路径?
要回答这个问题,必须了解优化器的工作原理。 优化器选择特定的访问路径通过估计多个访问路径的成本(如果只有几个表,则所有访问路径)并选择预计最有效的执行计划。确定执行计划成本的算法有规则,并且它根据从数据收集的统计数据进行估计。
与所有估计算法一样,它会对数据做出假设,例如基于列的最小/最大值的一般分布、基数以及段中值的物理分布(聚类因子)。
这如何应用于您的特定查询
在您的情况下,优化器必须对不同过滤器子句的选择性进行估计。在第一个查询中,过滤器位于两个变量(
add_months(trunc(sysdate,'MM'), -1) 和 sysdate
)之间,而在另一种情况下,过滤器位于常量和变量之间。它们对您来说看起来是一样的,因为您已经用变量的值替换了变量,但对优化器来说,情况非常不同:优化器(至少在 8i 中)只为特定查询计算一次执行计划。一旦访问路径确定,所有进一步的执行都将得到相同的执行计划。因此,它不能用变量的值替换变量,因为该值将来可能会发生变化,并且访问计划必须适用于所有可能的值。
由于第二个查询使用变量,优化器无法精确确定第一个查询的选择性,因此优化器会进行猜测,这会导致您的情况出现错误的计划。
当优化器没有选择正确的计划时该怎么办
如上所述,优化器有时会做出错误的猜测,从而导致访问路径不理想。即使这种情况很少发生,这也可能是灾难性的(几个小时而不是几秒钟)。您可以尝试以下一些操作:
ALL_TABLES
和ALL_INDEXES
上的last_analyzed
列将告诉您上次收集这些对象的统计信息的时间。良好可靠的统计数据可以带来更准确的猜测,从而(希望)带来更好的执行计划。dbms_stats
包)这确实是一个有趣的话题。 Oracle 优化器不断变化(在版本之间),它会随着时间的推移而改进,即使有时会在缺陷得到纠正时引入新的怪癖。如果您想了解更多信息,我建议 Jonathan Lewis 的 Cost Based Oracle :基础知识
First of all, the reason you are getting different execution time is not because Oracle executes the date function a lot. The execution of this SQL function, even if it is done for each and every row (it probably is not by the way), only takes a negligible amount of time compared to the time it takes to actually retrieve the rows from disk/memory.
You are getting completely different execution times because, as you have noticed, Oracle chooses a different access path. Choosing one access path over another can lead to orders of magnitudes of difference in execution time. The real question therefore, is not "why does
add_months
takes time?" but:Why does Oracle choose this particular unefficient path while there is a more efficient one?
To answer this question, one must understand how the optimizer works. The optimizer chooses a particular access path by estimating the cost of several access paths (all of them if there are only a few tables) and choosing the execution plan that is expected to be the most efficient. The algorithm to determine the cost of an execution plan has rules and it makes its estimation based on statistics gathered from your data.
As all estimation algorithms, it makes assumptions about your data, such as the general distribution based on min/max value of columns, cardinality, and the physical distribution of the values in the segment (clustering factor).
How this applies to your particular query
In your case, the optimizer has to make an estimation about the selectivity of the different filter clauses. In the first query the filter is between two variables (
add_months(trunc(sysdate,'MM'), -1) and sysdate
) while in the other case the filter is between a constant and a variable.They look the same to you because you have substituted the variable by its value, but to the optimizer the cases are very different: the optimizer (at least in 8i) only computes an execution plan once for a particular query. Once the access path has been determined, all further execution will get the same execution plan. It can not, therefore, replace a variable by its value because the value may change in the future, and the access plan must work for all possible values.
Since the second query uses variables, the optimizer cannot determine precisely the selectivity of the first query, so the optimizer makes a guess, and that results in your case in a bad plan.
What can you do when the optimizer doesn't choose the correct plan
As mentionned above, the optimizer sometimes makes bad guesses, which result in suboptimal access path. Even if it happens rarely, this can be disastrous (hours instead of seconds). Here are some actions you could try:
last_analyzed
column onALL_TABLES
andALL_INDEXES
will tell you when was the last time the stats were collected on these objects. Good reliable stats lead to more accurate guesses, leading (hopefully) to better execution plan.dbms_stats
package)This is really an interesting topic. The oracle optimizer is ever-changing (between releases) it improves over time, even if new quirks are sometimes introduced as defects get corrected. If you want to learn more, I would suggest Jonathan Lewis' Cost Based Oracle: Fundamentals
这是因为该函数会在每次比较时运行。
有时将其放入对偶选择中会更快:
否则,您也可以像这样加入日期:
最后一个选项,实际上是提高性能的最佳机会:向查询添加日期参数,如下所示:
[编辑]
抱歉,我发现您已经尝试过类似的选项。还是很奇怪。如果常量值有效,只要将 add_months 函数保留在查询之外,绑定参数也应该有效。
That's because the function is run for every comparison.
sometimes it's faster to put it in a select from dual:
otherwise, you could also join the date like this:
Last option and actually the best chance of improved performance: Add a date parameter to the query like this:
[edit]
I'm sorry, I see you already tried options like these. Still odd. If the constant value works, the bind parameter should work as well, as long as you keep the add_months function outside the query.
这就是 SQL。您可能想要使用 PL/SQL 并将计算 add_months(trunc(sysdate,'MM'), -1) 首先保存到变量中,然后绑定它。
另外,我发现 SAS 计算需要很长时间,因为通过网络提取数据并对其处理的每一行执行额外的工作。根据您的环境,您可以考虑首先创建一个临时表来存储这些联接的结果,然后再访问临时表(尝试 CTAS)。
This is SQL. You may want to use PL/SQL and save the calculation add_months(trunc(sysdate,'MM'), -1) into a variable first ,then bind that.
Also, I've seen SAS calcs take a long while due to pulling data across the network and doing additional work on each row it processes. Depending on your environment, you may consider creating a temp table to store the results of these joins first, then hitting the temp table (try a CTAS).