了解 Oracle SQL Developer 中执行解释计划的结果

发布于 2024-07-21 14:11:27 字数 158 浏览 11 评论 0原文

我正在尝试优化查询，但不太理解从解释计划返回的一些信息。谁能告诉我 OPTIONS 和 COST 列的重要性？在“选项”列中，我只看到“完整”一词。在 COST 列中，我可以推断出较低的成本意味着更快的查询。但成本值到底代表什么以及可接受的阈值是多少？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

离去的眼神 2024-07-28 14:11:27

EXPLAIN PLAN 的输出是 Oracle 查询优化器的调试输出。 COST 是基于成本的优化器 (CBO) 的最终输出，其目的是选择应使用许多不同的可能计划中的哪一个来运行查询。 CBO 计算每个计划的相对成本，然后选择成本最低的计划。

（注意：在某些情况下，CBO 没有足够的时间来评估每个可能的计划；在这些情况下，它只会选择迄今为止发现的成本最低的计划）

一般来说，导致查询缓慢的最大因素之一是数量为查询服务而读取的行数（更准确地说是块），因此成本将部分基于优化器估计需要读取的行数。

例如，假设您有以下查询：（

SELECT emp_id FROM employees WHERE months_of_service = 6;

months_of_service 列对其具有 NOT NULL 约束，并对其具有普通索引。）

优化器可以在此处选择两个基本计划：

计划 1 ：读取“employees”表中的所有行，检查每行的谓词是否为 true (months_of_service=6)。
方案2：读取索引where months_of_service=6（这会产生一组ROWID），然后根据返回的ROWID访问表。

假设“employees”表有 1,000,000（100 万）行。让我们进一步想象，months_of_service 的值范围从 1 到 12，并且由于某种原因分布相当均匀。

计划 1 涉及 FULL SCAN，其成本将是读取员工表中所有行的成本，大约等于 1,000,000；但由于 Oracle 通常能够使用多块读取来读取块，因此实际成本会更低（取决于数据库的设置方式） - 例如，让我们假设多块读取计数为 10 - 计算出的成本完整扫描将为 1,000,000 / 10；总成本 = 100,000。

计划 2 涉及 INDEX RANGE SCAN 和按 ROWID 进行表查找，其成本将是扫描索引的成本加上按 ROWID 访问表的成本。我不会讨论索引范围扫描的成本是如何计算的，但让我们假设索引范围扫描的成本是每行 1；我们期望在 12 个案例中找到 1 个匹配项，因此索引扫描的成本为 1,000,000 / 12 = 83,333；加上访问表的成本（假设每次访问读取 1 个块，我们这里不能使用多块读取）= 83,333；总成本 = 166,666。

正如您所看到的，计划 1（完整扫描）的成本低于计划 2（索引扫描 + rowid 访问）的成本 - 这意味着 CBO 将选择完整扫描。

如果优化器在这里所做的假设是正确的，那么实际上计划 1 会比计划 2 更可取并且更有效 - 这反驳了完整扫描“总是不好”的神话。

如果优化器的目标是 FIRST_ROWS(n) 而不是 ALL_ROWS，结果将会大不相同 - 在这种情况下，优化器会倾向于计划 2，因为它通常会更快地返回前几行，但代价是整个查询的效率较低。

The output of EXPLAIN PLAN is a debug output from Oracle's query optimiser. The COST is the final output of the Cost-based optimiser (CBO), the purpose of which is to select which of the many different possible plans should be used to run the query. The CBO calculates a relative Cost for each plan, then picks the plan with the lowest cost.

(Note: in some cases the CBO does not have enough time to evaluate every possible plan; in these cases it just picks the plan with the lowest cost found so far)

In general, one of the biggest contributors to a slow query is the number of rows read to service the query (blocks, to be more precise), so the cost will be based in part on the number of rows the optimiser estimates will need to be read.

For example, lets say you have the following query:

SELECT emp_id FROM employees WHERE months_of_service = 6;

(The months_of_service column has a NOT NULL constraint on it and an ordinary index on it.)

There are two basic plans the optimiser might choose here:

Plan 1: Read all the rows from the "employees" table, for each, check if the predicate is true (months_of_service=6).
Plan 2: Read the index where months_of_service=6 (this results in a set of ROWIDs), then access the table based on the ROWIDs returned.

Let's imagine the "employees" table has 1,000,000 (1 million) rows. Let's further imagine that the values for months_of_service range from 1 to 12 and are fairly evenly distributed for some reason.

The cost of Plan 1, which involves a FULL SCAN, will be the cost of reading all the rows in the employees table, which is approximately equal to 1,000,000; but since Oracle will often be able to read the blocks using multi-block reads, the actual cost will be lower (depending on how your database is set up) - e.g. let's imagine the multi-block read count is 10 - the calculated cost of the full scan will be 1,000,000 / 10; Overal cost = 100,000.

The cost of Plan 2, which involves an INDEX RANGE SCAN and a table lookup by ROWID, will be the cost of scanning the index, plus the cost of accessing the table by ROWID. I won't go into how index range scans are costed but let's imagine the cost of the index range scan is 1 per row; we expect to find a match in 1 out of 12 cases, so the cost of the index scan is 1,000,000 / 12 = 83,333; plus the cost of accessing the table (assume 1 block read per access, we can't use multi-block reads here) = 83,333; Overall cost = 166,666.

As you can see, the cost of Plan 1 (full scan) is LESS than the cost of Plan 2 (index scan + access by rowid) - which means the CBO would choose the FULL scan.

If the assumptions made here by the optimiser are true, then in fact Plan 1 will be preferable and much more efficient than Plan 2 - which disproves the myth that FULL scans are "always bad".

The results would be quite different if the optimiser goal was FIRST_ROWS(n) instead of ALL_ROWS - in which case the optimiser would favour Plan 2 because it will often return the first few rows quicker, at the cost of being less efficient for the entire query.

回复收藏 0 原文

红玫瑰 2024-07-28 14:11:27

CBO 构建决策树，估计每个查询可用的每个可能执行路径的成本。成本由实例上设置的 CPU_cost 或 I/O_cost 参数设置。 CBO 会尽可能利用查询将使用的表和索引的现有统计信息来估计成本。您不应仅根据成本来调整查询。成本使您能够了解优化器为何执行其操作。无需成本，您就可以弄清楚为什么优化器选择了它所执行的计划。较低的成本并不意味着更快的查询。在某些情况下这是正确的，在某些情况下这是错误的。成本基于您的表格统计数据，如果它们错误，则成本将是错误的。

调整查询时，您应该查看每个步骤的基数和行数。它们有道理吗？优化器假设的基数是否正确？返回的行是否合理。如果提供的信息是错误的，那么优化器很可能没有做出正确决策所需的正确信息。这可能是由于表和索引以及 cpu 统计信息的统计信息过时或丢失。在调整查询时最好更新统计信息，以充分利用优化器。了解您的架构在调整时也有很大帮助。了解优化器何时选择了一个非常糟糕的决策，并通过一个小提示将其指向正确的路径可以节省大量时间。

回复收藏 0 原文

枯寂 2024-07-28 14:11:27

在最新的 Oracle 版本中，COST 表示优化器期望查询花费的时间，以单个块读取所需的时间量为单位表示。

因此，如果单个块读取需要 2 毫秒并且成本表示为“250”，则查询预计需要 500 毫秒才能完成。

优化器根据估计的单块和多块读取数量以及计划的 CPU 消耗来计算成本。后者通过在其他操作之前执行某些操作来尝试避免高 CPU 成本操作，从而在最小化成本方面非常有用。

这就提出了一个问题：优化器如何知道操作需要多长时间。最近的 Oracle 版本允许收集“系统统计信息”，这绝对不要与表或索引的统计信息混淆。系统统计数据是对硬件性能的测量，最重要的是：

单块读取需要多长时间
多块读取需要
多长时间多块读取有多大（由于表范围小于最大范围，通常与可能的最大值不同）以及其他原因）。
CPU性能

这些数字可能会根据系统的运行环境而有很大差异，并且可以为“白天OLTP”操作和“夜间批量报告”操作以及“月末报告”（如果您愿意）存储不同组的统计数据。

给定这些统计数据集，可以评估给定查询执行计划在不同操作环境中的成本，这可能会促进有时使用全表扫描或有时使用索引扫描。

成本并不完美，但优化器在每次发布时都能更好地进行自我监控，并且可以反馈实际成本与估计成本的比较，以便为未来做出更好的决策。这也使得预测变得更加困难。

请注意，成本不一定是挂钟时间，因为并行查询操作会消耗多个线程的总时间。

在旧版本的 Oracle 中，忽略了 CPU 操作的成本，并且根据 init 参数有效地固定了单块和多块读取的相对成本。

In recent Oracle versions the COST represent the amount of time that the optimiser expects the query to take, expressed in units of the amount of time required for a single block read.

So if a single block read takes 2ms and the cost is expressed as "250", the query could be expected to take 500ms to complete.

The optimiser calculates the cost based on the estimated number of single block and multiblock reads, and the CPU consumption of the plan. the latter can be very useful in minimising the cost by performing certain operations before others to try and avoid high CPU cost operations.

This raises the question of how the optimiser knows how long operations take. recent Oracle versions allow the collections of "system statistics", which are definitely not to be confused with statistics on tables or indexes. The system statistics are measurements of the performance of the hardware, mostly importantly:

How long a single block read takes
How long a multiblock read takes
How large a multiblock read is (often different to the maximum possible due to table extents being smaller than the maximum, and other reasons).
CPU performance

These numbers can vary greatly according to the operating environment of the system, and different sets of statistics can be stored for "daytime OLTP" operations and "nighttime batch reporting" operations, and for "end of month reporting" if you wish.

Given these sets of statistics, a given query execution plan can be evaluated for cost in different operating environments, which might promote use of full table scans at some times or index scans at others.

The cost is not perfect, but the optimiser gets better at self-monitoring with every release, and can feedback the actual cost in comparison to the estimated cost in order to make better decisions for the future. this also makes it rather more difficult to predict.

Note that the cost is not necessarily wall clock time, as parallel query operations consume a total amount of time across multiple threads.

In older versions of Oracle the cost of CPU operations was ignored, and the relative costs of single and multiblock reads were effectively fixed according to init parameters.

回复收藏 0 原文