如何在 PL/SQL 中内联变量?

发布于 2024-10-23 00:09:51 字数 1552 浏览 16 评论 0原文

情况

对于 Oracle 11.2.0.2.0 中的大量数据的中型查询,我的查询执行计划遇到了一些问题。为了加快速度,我引入了一个范围过滤器,它的作用大致如下:

PROCEDURE DO_STUFF(
    org_from VARCHAR2 := NULL,
    org_to   VARCHAR2 := NULL)

  -- [...]
  JOIN organisations org
    ON (cust.org_id = org.id
   AND ((org_from IS NULL) OR (org_from <= org.no))
   AND ((org_to   IS NULL) OR (org_to   >= org.no)))
  -- [...]

如您所见,我想使用可选范围来限制 organizationsJOIN组织编号。客户端代码可以在有(应该很快)或没有(非常慢)限制的情况下调用DO_STUFF

麻烦的

是,PL/SQL 将为上述 org_from 和 org_to 参数创建绑定变量,这正是我在大多数情况下所期望

  -- [...]
  JOIN organisations org
    ON (cust.org_id = org.id
   AND ((:B1 IS NULL) OR (:B1 <= org.no))
   AND ((:B2 IS NULL) OR (:B2 >= org.no)))
  -- [...]

的 在这种情况下,当我内联这些值时,我测量到查询执行计划要好得多,即当 Oracle 执行的查询实际上类似于

  -- [...]
  JOIN organisations org
    ON (cust.org_id = org.id
   AND ((10 IS NULL) OR (10 <= org.no))
   AND ((20 IS NULL) OR (20 >= org.no)))
  -- [...]

“很多”时,我的意思是快 5-10 倍。请注意,该查询很少执行,即每月一次。所以我不需要缓存执行计划。

我的问题

  • 如何在 PL/SQL 中内联值?我知道立即执行,但我更愿意PL/SQL 编译我的查询,而不进行字符串连接。

  • 我只是测量了巧合发生的事情还是我可以假设内联变量确实更好(在这种情况下)?我之所以问这个问题,是因为我认为绑定变量迫使 Oracle 设计一个通用执行计划,而内联值则允许分析非常具体的列和索引统计信息。所以我可以想象这不仅仅是一个巧合。

  • 我错过了什么吗?除了变量内联之外,也许还有一种完全不同的方法来实现查询执行计划的改进(请注意,我也尝试了很多提示,但我不是该领域的专家)?

The Situation

I have some trouble with my query execution plan for a medium-sized query over a large amount of data in Oracle 11.2.0.2.0. In order to speed things up, I introduced a range filter that does roughly something like this:

PROCEDURE DO_STUFF(
    org_from VARCHAR2 := NULL,
    org_to   VARCHAR2 := NULL)

  -- [...]
  JOIN organisations org
    ON (cust.org_id = org.id
   AND ((org_from IS NULL) OR (org_from <= org.no))
   AND ((org_to   IS NULL) OR (org_to   >= org.no)))
  -- [...]

As you can see, I want to restrict the JOIN of organisations using an optional range of organisation numbers. Client code can call DO_STUFF with (supposed to be fast) or without (very slow) the restriction.

The Trouble

The trouble is, PL/SQL will create bind variables for the above org_from and org_to parameters, which is what I would expect in most cases:

  -- [...]
  JOIN organisations org
    ON (cust.org_id = org.id
   AND ((:B1 IS NULL) OR (:B1 <= org.no))
   AND ((:B2 IS NULL) OR (:B2 >= org.no)))
  -- [...]

The Workaround

Only in this case, I measured the query execution plan to be a lot better when I just inline the values, i.e. when the query executed by Oracle is actually something like

  -- [...]
  JOIN organisations org
    ON (cust.org_id = org.id
   AND ((10 IS NULL) OR (10 <= org.no))
   AND ((20 IS NULL) OR (20 >= org.no)))
  -- [...]

By "a lot", I mean 5-10x faster. Note that the query is executed very rarely, i.e. once a month. So I don't need to cache the execution plan.

My questions

  • How can I inline values in PL/SQL? I know about EXECUTE IMMEDIATE, but I would prefer to have PL/SQL compile my query, and not do string concatenation.

  • Did I just measure something that happened by coincidence or can I assume that inlining variables is indeed better (in this case)? The reason why I ask is because I think that bind variables force Oracle to devise a general execution plan, whereas inlined values would allow for analysing very specific column and index statistics. So I can imagine that this is not just a coincidence.

  • Am I missing something? Maybe there is an entirely other way to achieve query execution plan improvement, other than variable inlining (note I have tried quite a few hints as well but I'm not an expert on that field)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

二手情话 2024-10-30 00:09:51

在您的一篇评论中您说:

“我还检查了各种绑定值。
通过绑定变量我得到一些完整的
表扫描,而使用硬编码
值,该计划看起来好多了。”

有两种路径。如果您传入 NULL 作为参数,那么您将选择所有记录。在这种情况下,全表扫描是检索数据的最有效方法。如果您传入值然后索引读取可能更有效,因为您只选择信息的一小部分,

当您使用绑定变量制定查询时,优化器必须做出决定:是否应该假设大多数情况下您会通过。那么,换个角度看:当您只需要选择记录的子集时进行全表扫描,或者当您需要选择记录的子集时进行索引读取,效率是否更高。需要选择所有记录吗?

似乎优化器已经将全表扫描视为覆盖所有可能性的最低效操作,

而当您对值进行硬编码时,优化器会立即知道10 IS NULL 评估结果为 FALSE,因此它可以权衡使用索引读取来查找所需子集记录的优点。


那么,该怎么办呢?正如您所说,此查询每月仅运行一次,我认为只需要对业务流程进行少量更改即可进行单独的查询:一个针对所有组织,一个针对组织的子集。


“顺便说一句,删除 :R1 IS NULL 子句
不改变执行计划
太多了,这让我剩下了另一个
OR 条件的一侧,:R1 <=
org.no 其中 NULL 没有意义
无论如何,因为 org.no 不为空”

好吧,问题是你有一对指定范围的绑定变量。根据值的分布,不同的范围可能适合不同的执行计划。也就是说,这个范围(可能)适合索引范围扫描......

WHERE org.id BETWEEN 10 AND 11

而这可能更适合全表扫描......

WHERE org.id BETWEEN 10 AND 1199999

这就是绑定变量查看发挥作用的地方

(取决于分布 。当然是价值观)。

In one of your comments you said:

"Also I checked various bind values.
With bind variables I get some FULL
TABLE SCANS, whereas with hard-coded
values, the plan looks a lot better."

There are two paths. If you pass in NULL for the parameters then you are selecting all records. Under those circumstances a Full Table Scan is the most efficient way of retrieving data. If you pass in values then indexed reads may be more efficient, because you're only selecting a small subset of the information.

When you formulate the query using bind variables the optimizer has to take a decision: should it presume that most of the time you'll pass in values or that you'll pass in nulls? Difficult. So look at it another way: is it more inefficient to do a full table scan when you only need to select a sub-set of records, or to do indexed reads when you need to select all records?

It seems as though the optimizer has plumped for full table scans as being the least inefficient operation to cover all eventualities.

Whereas when you hard code the values the Optimizer knows immediately that 10 IS NULL evaluates to FALSE, and so it can weigh the merits of using indexed reads for find the desired sub-set records.


So, what to do? As you say this query is only run once a month I think it would only require a small change to business processes to have separate queries: one for all organisations and one for a sub-set of organisations.


"Btw, removing the :R1 IS NULL clause
doesn't change the execution plan
much, which leaves me with the other
side of the OR condition, :R1 <=
org.no where NULL wouldn't make sense
anyway, as org.no is NOT NULL"

Okay, so the thing is you have a pair of bind variables which specify a range. Depending on the distribution of values, different ranges might suit different execution plans. That is, this range would (probably) suit an indexed range scan...

WHERE org.id BETWEEN 10 AND 11

...whereas this is likely to be more fitted to a full table scan...

WHERE org.id BETWEEN 10 AND 1199999

That is where Bind Variable Peeking comes into play.

(depending on distribution of values, of course).

囍笑 2024-10-30 00:09:51

由于查询计划实际上始终不同,这意味着优化器的基数估计由于某种原因而关闭。您能否从查询计划中确认优化器预期使用绑定变量时条件选择性不足?由于您使用的是 11.2,Oracle 应该使用 自适应光标共享,因此它不应该是绑定变量窥视问题(假设您在测试中使用不同的 NO 值多次调用带有绑定变量的版本。

是好计划的基数估计实际上是正确的吗?我知道您说过 NO 列上的统计信息是准确的,但我对您的常规统计收集过程可能不会更新的杂散直方图表示怀疑例如,

您始终可以在查询中使用提示来强制使用特定索引(尽管使用

但是,可以尝试进行一项额外的测试。是将 SQL 99 连接语法替换为 Oracle 的旧语法,即

SELECT <<something>>
  FROM <<some other table>> cust,
       organization org
 WHERE cust.org_id = org.id
   AND (    ((org_from IS NULL) OR (org_from <= org.no)) 
        AND ((org_to   IS NULL) OR (org_to   >= org.no)))

这显然不应该改变任何内容,但 SQL 99 语法存在解析器问题,因此需要检查。

Since the query plans are actually consistently different, that implies that the optimizer's cardinality estimates are off for some reason. Can you confirm from the query plans that the optimizer expects the conditions to be insufficiently selective when bind variables are used? Since you're using 11.2, Oracle should be using adaptive cursor sharing so it shouldn't be a bind variable peeking issue (assuming you are calling the version with bind variables many times with different NO values in your testing.

Are the cardinality estimates on the good plan actually correct? I know you said that the statistics on the NO column are accurate but I would be suspicious of a stray histogram that may not be updated by your regular statistics gathering process, for example.

You could always use a hint in the query to force a particular index to be used (though using a stored outline or optimizer plan stability would be preferable from a long-term maintenance perspective). Any of those options would be preferable to resorting to dynamic SQL.

One additional test to try, however, would be to replace the SQL 99 join syntax with Oracle's old syntax, i.e.

SELECT <<something>>
  FROM <<some other table>> cust,
       organization org
 WHERE cust.org_id = org.id
   AND (    ((org_from IS NULL) OR (org_from <= org.no)) 
        AND ((org_to   IS NULL) OR (org_to   >= org.no)))

That obviously shouldn't change anything, but there have been parser issues with the SQL 99 syntax so that's something to check.

暮光沉寂 2024-10-30 00:09:51

It smells like Bind Peeking, but I am only on Oracle 10, so I can't claim the same issue exists in 11.

把回忆走一遍 2024-10-30 00:09:51

这看起来很像需要自适应游标共享,并结合 SQLPlan 稳定性。
我认为发生的情况是 capture_sql_plan_baselines 参数为 trueuse_sql_plan_baselines 也是如此。如果这是真的,则会发生以下情况:

  1. 第一次启动查询时,它会被解析,并获得一个新计划。
  2. 第二次,该计划作为接受的计划存储在 sql_plan_baselines 中。
  3. 此查询的所有后续运行都使用此计划,无论绑定变量是什么。

如果自适应游标共享已经处于活动状态,优化器将生成一个新的/更好的计划,将其存储在 sql_plan_baselines 中,但无法使用它,直到有人接受这个较新的计划作为可接受的替代计划。检查dba_sql_plan_baselines并查看您的查询是否包含accepted = 'NO'且verified = null的条目
您可以使用 dbms_spm.evolve 来改进新计划,如果计划的性能至少比没有新计划的性能好 1.5 倍,则自动接受它。

我希望这有帮助。

This looks a lot like a need for Adaptive Cursor Sharing, combined with SQLPlan stability.
I think what is happening is that the capture_sql_plan_baselines parameter is true. And the same for use_sql_plan_baselines. If this is true, the following is happening:

  1. The first time that a query started it is parsed, it gets a new plan.
  2. The second time, this plan is stored in the sql_plan_baselines as an accepted plan.
  3. All following runs of this query use this plan, regardless of what the bind variables are.

If Adaptive Cursor Sharing is already active,the optimizer will generate a new/better plan, store it in the sql_plan_baselines but is not able to use it, until someone accepts this newer plan as an acceptable alternative plan. Check dba_sql_plan_baselines and see if your query has entries with accepted = 'NO' and verified = null
You can use dbms_spm.evolve to evolve the new plan and have it automatically accepted if the performance of the plan is at least 1,5 times better than without the new plan.

I hope this helps.

葬花如无物 2024-10-30 00:09:51

我将其添加为评论,但也会在这里提供。希望这不是过于简单化,看看详细的回复,我可能会误解确切的问题,但无论如何......

似乎您的组织表有列 no (org.no) 被定义为数字。在您的硬编码示例中,您使用数字进行比较。

JOIN organisations org
    ON (cust.org_id = org.id
   AND ((10 IS NULL) OR (10 <= org.no))
   AND ((20 IS NULL) OR (20 >= org.no)))

在您的过程中,您传入 varchar2

PROCEDURE DO_STUFF(
    org_from VARCHAR2 := NULL,
    org_to   VARCHAR2 := NULL)

因此要将 varchar2 与数字进行比较,Oracle 将必须进行转换,因此这可能会导致完整扫描。

解决方案:更改过程以传递数字

I added this as a comment, but will offer up here as well. Hope this isn't overly simplistic, and looking at the detailed responses I may be misunderstanding the exact problem, but anyway...

Seems your organisations table has column no (org.no) that is defined as a number. In your hardcoded example, you use numbers to do the compares.

JOIN organisations org
    ON (cust.org_id = org.id
   AND ((10 IS NULL) OR (10 <= org.no))
   AND ((20 IS NULL) OR (20 >= org.no)))

In your procedure, you are passing in varchar2:

PROCEDURE DO_STUFF(
    org_from VARCHAR2 := NULL,
    org_to   VARCHAR2 := NULL)

So to compare varchar2 to number, Oracle will have to do the conversions, so this may cause the full scans.

Solution: change proc to pass in numbers

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文