如何在没有提示的情况下增加哈希连接、分组依据和排序依据的 Oracle CBO 成本估算

发布于 2024-07-26 09:57:28 字数 497 浏览 3 评论 0原文

看起来,在我们拥有的某些服务器上,哈希连接、分组依据和排序依据的成本与实际成本相比太低了。 即,带有索引范围扫描的执行计划通常优于前者,但在解释计划中,成本显示得更高。

一些进一步的说明:

  1. 我已经将 optimizer_index_cost_adj 设置为 20,但它仍然不够好。 我不想增加纯全表扫描的成本,事实上我不介意优化器降低成本。
  2. 我注意到pga_aggregate_target对CBO成本估算有影响,但我绝对不想降低这个参数,因为我们有足够的RAM。
  3. 与在单个查询中使用优化器提示相反,我希望设置是全局的。

编辑1:我正在考虑尝试动态采样,但我没有足够的深入知识来预测这将如何影响整体性能,即执行计划更改的频率。 我肯定更喜欢非常稳定的东西,事实上,对于我们的一些最大的客户,我们有锁定所有统计数据的政策(这将随着 Oracle 11g SQL 计划管理而改变)。

It seems that on some of the servers that we have, the cost of hash joins, group by's and order by's is too low compared to the actual cost. I.e. often execution plans with index range scans outperform the former, but on explain plan the cost shows up as higher.

Some further notes:

  1. I already set optimizer_index_cost_adj to 20 and it's still not good enough. I do NOT want to increase the cost for pure full table scans, in fact I wouldn't mind the optimizer decreasing the cost.
  2. I've noticed that pga_aggregate_target makes an impact on CBO cost estimates, but I definitely do NOT want to lower this parameter as we have plenty of RAM.
  3. As opposed to using optimizer hints in individual queries, I want the settings to be global.

Edit 1: I'm thinking about experimenting with dynamic sampling, but I don't have enough intimate knowledge to predict how this could affect the overall performance, i.e. how frequently the execution plans could change. I would definitely prefer something which is very stable, in fact for some of our largest clients we have a policy of locking the all the stats (which will change with Oracle 11g SQL Plan Management).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

风透绣罗衣 2024-08-02 09:57:28

通常,当使用索引范围扫描的执行计划优于使用完整扫描+排序或散列连接的执行计划时,但 CBO 会选择完整扫描,这是因为优化器相信它将找到比现实生活中实际获得的更多匹配结果。

换句话说,如果优化器认为要从A表中获取1M行,从B表中获取1000行,那么它很可能会选择全扫描+排序合并或散列连接; 然而,如果当它实际运行查询时,它只从表 A 中获取 1 行,那么索引范围扫描可能会更好。

我首先查看一些性能不佳的查询并分析谓词的选择性,确定优化器是否对每个表的行数进行了合理的估计。

编辑:
您提到基数估计是不正确的。 这是你问题的根本原因; 哈希连接和排序的成本可能相当不错。 在某些情况下,优化器可能会使用错误的估计,因为它不知道数据的相关程度。 某些列上的直方图可能会有所帮助(如果您还没有获得它们),并且在某些情况下,您可以创建基于函数的索引并收集隐藏列的统计信息,以便为优化器提供更好的数据。

归根结底,您很可能需要在查询中指定各个表的基数的技巧才能获得令人满意的性能。

Quite often when execution plans with index range scans outperform those with full scans + sorts or hash joins, but the CBO is picking the full scans, it's because the optimiser believes it's going to find more matching results than it actually gets in real life.

In other words, if the optimiser thinks it's going to get 1M rows from table A and 1000 rows from table B, it may very well choose full scans + sort merge or hash join; if, however, when it actually runs the query, it only gets 1 row from table A, an index range scan may very well be better.

I'd first look at some poorly performing queries and analyse the selectivity of the predicates, determine whether the optimiser is making reasonable estimates of the number of rows for each table.

EDIT:
You've mentioned that the cardinality estimates are incorrect. This is the root cause of your problems; the costing of hash joins and sorts are probably quite ok. In some cases the optimiser may be using wrong estimates because it doesn't know how much the data is correlated. Histograms on some columns may help (if you haven't already got them), and in some cases you can create function-based indexes and gather statistics on the hidden columns to provide even better data to the optimiser.

At the end of the day, your trick of specifying the cardinalities of various tables in the queries may very well be required to get satisfactory performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文