Oracle CBO 何时选择执行“合并连接笛卡尔”操作?手术?
有时,Oracle 似乎更喜欢使用 MERGE JOIN CARTESIAN 操作而不是常规的 MERGE JOIN 操作。了解数据并查看具体的执行计划,我可以发现此操作通常不是问题,因为连接的实体之一只能返回当前查询中的一条记录。
然而,由于历史原因,我们的 DBA 普遍厌恶笛卡尔积。
因此,我想更好地分析这些案例,并在我的论证中得到文档的支持。是否有关于查询转换和 CBO 的官方 Oracle 文档,我可以在其中了解 Oracle 更喜欢 MERGE JOIN CARTESIAN (或类似)操作的情况?
在本例中,我使用的是 Oracle 11g (11.2.0.2.0)
更新:
这些是类似的问题,但它们没有解释为什么或何时 Oracle 更喜欢 MJC
而不是常规的 MERGE JOIN
:
From time to time, Oracle seems to prefer a MERGE JOIN CARTESIAN
operation over a regular MERGE JOIN
. Knowing the data and looking at concrete execution plans, I can see that this operation is usually not a problem, as one of the joined entities can return only exactly one record in the query at hand.
However, for historic reasons, our DBAs have a general distaste for cartesian products.
So I'd like to better analyse those cases and be backed up with documentation in my argumentation. Is there any official Oracle documentation about query transformation and the CBO where I can understand the cases when Oracle prefers the MERGE JOIN CARTESIAN
(or similar) operation?
In this case, I'm Using Oracle 11g (11.2.0.2.0)
UPDATE:
These are similar questions, but they don't explain why or when Oracle prefers the MJC
over a regular MERGE JOIN
:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,一提到笛卡尔连接通常会让 DBA 心跳加速。由于缺少连接条件而导致的笛卡尔连接绝对是一个难以处理的痛苦 - 这些连接类型可能会“炸毁”临时空间并导致所有类型的警报响起。
我在 Oracle 的官方 11g 文档中没有找到有关此特定连接方法的任何内容,但我确实在其支持数据库中找到了大量有关该方法问题的文章。在过去的几周里,我已经追踪了其中的一些,这就是我发现的。
MJC 的来源是 CBO 优化。 MJC 是一种优化,当连接的结果集的基数较低时效果很好。当优化器未正确估计作为连接输入的一个或多个结果集的基数时,就会出现此问题。如果估计行数 = 1(或者是一个较小的数字),但结果集的实际行数很大,则优化器仍可能选择 MJC,从而导致次优计划。这是轻描淡写的说法。我遇到了这种情况的问题,查询运行了几天但没有完成。在让国会预算办公室重回正轨后,他们的运作只需几秒钟,而不是几小时或几天。
确定估计行与实际行是否相符的最佳方法是运行查询并查看其执行计划统计信息。您提到您使用的是 11g - 使用 SQL 监控功能。此功能的输出将显示执行计划的每个步骤花费了多少时间。它还会向您显示估计行数与实际行数。您正在寻找 MJC 输入的估计行数与实际行数之间的较大差异。
SQL 监控可通过 OEM/DB Control 进行,或者您也可以使用 API(搜索 DBMS_SQLTUNE.REPORT_SQL_MONITOR)。可以使用 GATHER_PLAN_STATISTICS 提示和查询来收集相同类型的信息,然后使用 DBMS_XPLAN 生成报告...详细信息为 在这里 来做到这一点。
那么如何摆脱它呢?尝试解决对象统计问题。一旦 CBO 知道它实际上正在处理数百、数千或数百万条记录作为连接的输入而不是“1”,它就应该选择更适合数据集的连接方法,而不是选择 MJC。说起来容易做起来难,已经写了关于这个主题的书籍,但至少检查一下基础知识 - 确保查询中涉及的所有表至少都有统计信息。如果您在 where 子句中应用了多列表达式,也可以利用补充统计信息。
如果您需要一把大锤子,有一些隐藏参数允许/禁止使用 MJC。它们可以在数据库级别、会话级别或查询级别(使用提示)实现。我将保留参数名称作为读者的练习,因为 Oracle 的官方立场是它们只能在支持人员的指导下使用。不要告诉他们,但在尝试获取对象统计信息进行合作失败后,我已经通过 OPT_PARAM 提示成功地消除了查询级别的 MJC。
Yes, the mention of cartesian joins usually makes a DBA's heart skip a beat. The cartesian joins caused by missing join conditions are definitely a pain to deal with - these are the types of joins that can "blow up" temp space and cause all types of alarms to go off.
I didn't find anything in Oracle's official 11g documentation on this particular join method but I did find plenty of articles about issues with it in their support DB. I have chased down a few of these in the past couple of weeks and here is what I found.
The source of the MJC is a CBO optimization. MJC is an optimization that works great when the cardinality of the result set being joined is low. The problem occurs when the Optimizer is not correctly estimating the cardinality of one or more of the result sets that are inputs to the join. If the estimated rows = 1 (or is a low number) but the actual rows for the result set is large then the optimizer may still choose a MJC resulting in a sub optimal plan. And that's an understatement. I have had issues with this happening and queries running for days and not finishing. After getting the CBO back on track they have run in seconds instead of hours or days.
The best way to find out if this Estimated Rows vs Actual Rows is the case is to run the query and view its execution plan statistics. You mentioned you are on 11g - use the SQL Monitoring feature. The output of this feature will show you how much time was spent on each step of your execution plan. It will also show you Estimated Rows vs Actual rows. You are looking for large discrepancies in Estimated Rows vs Actual Rows on the inputs for the MJC.
SQL Monitoring is available through OEM/DB Control, or you can use the API (search for DBMS_SQLTUNE.REPORT_SQL_MONITOR). The same sorts of info can be gathered using the GATHER_PLAN_STATISTICS hint with the query and then generating a report with DBMS_XPLAN... details are here to do that.
So how to get rid of it? Try to resolve object statistics issues. Once the CBO knows it is really dealing with hundreds, thousands, or millions of records as inputs to the join instead of '1' it should choose a join method more appropriate for the data set and not choose MJC. Easier said than done, books have been written on this topic but at least check out the basics - make sure all tables involved in the query at least have statistics. It may be possible to leverage Supplemental Statistics as well if you have multi-column expressions being applied in your where clause.
If you need a big hammer there are some hidden parameters that allow/disallow use of MJC. They can be implemented at a database level, at a session level, or query level (using hints). I'll leave the parameters names out as an exercise for the reader as Oracle's official stance is they should only be used under the direction of Support. Don't tell them but I have had some success eliminating MJC on the query level with an OPT_PARAM hint after attempts to get object statistics to cooperate failed.