了解 Oracle SQL Developer 中执行解释计划的结果
我正在尝试优化查询,但不太理解从解释计划返回的一些信息。 谁能告诉我 OPTIONS 和 COST 列的重要性? 在“选项”列中,我只看到“完整”一词。 在 COST 列中,我可以推断出较低的成本意味着更快的查询。 但成本值到底代表什么以及可接受的阈值是多少?
I'm trying to optimize a query but don't quite understand some of the information returned from Explain Plan. Can anyone tell me the significance of the OPTIONS and COST columns? In the OPTIONS column, I only see the word FULL. In the COST column, I can deduce that a lower cost means a faster query. But what exactly does the cost value represent and what is an acceptable threshold?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
EXPLAIN PLAN 的输出是 Oracle 查询优化器的调试输出。 COST 是基于成本的优化器 (CBO) 的最终输出,其目的是选择应使用许多不同的可能计划中的哪一个来运行查询。 CBO 计算每个计划的相对成本,然后选择成本最低的计划。
(注意:在某些情况下,CBO 没有足够的时间来评估每个可能的计划;在这些情况下,它只会选择迄今为止发现的成本最低的计划)
一般来说,导致查询缓慢的最大因素之一是数量为查询服务而读取的行数(更准确地说是块),因此成本将部分基于优化器估计需要读取的行数。
例如,假设您有以下查询:(
months_of_service
列对其具有 NOT NULL 约束,并对其具有普通索引。)优化器可以在此处选择两个基本计划:
months_of_service=6
)。months_of_service=6
(这会产生一组ROWID),然后根据返回的ROWID访问表。假设“employees”表有 1,000,000(100 万)行。 让我们进一步想象,months_of_service 的值范围从 1 到 12,并且由于某种原因分布相当均匀。
计划 1 涉及 FULL SCAN,其成本将是读取员工表中所有行的成本,大约等于 1,000,000; 但由于 Oracle 通常能够使用多块读取来读取块,因此实际成本会更低(取决于数据库的设置方式) - 例如,让我们假设多块读取计数为 10 - 计算出的成本完整扫描将为 1,000,000 / 10; 总成本 = 100,000。
计划 2 涉及 INDEX RANGE SCAN 和按 ROWID 进行表查找,其成本将是扫描索引的成本加上按 ROWID 访问表的成本。 我不会讨论索引范围扫描的成本是如何计算的,但让我们假设索引范围扫描的成本是每行 1; 我们期望在 12 个案例中找到 1 个匹配项,因此索引扫描的成本为 1,000,000 / 12 = 83,333; 加上访问表的成本(假设每次访问读取 1 个块,我们这里不能使用多块读取)= 83,333; 总成本 = 166,666。
正如您所看到的,计划 1(完整扫描)的成本低于计划 2(索引扫描 + rowid 访问)的成本 - 这意味着 CBO 将选择完整扫描。
如果优化器在这里所做的假设是正确的,那么实际上计划 1 会比计划 2 更可取并且更有效 - 这反驳了完整扫描“总是不好”的神话。
如果优化器的目标是 FIRST_ROWS(n) 而不是 ALL_ROWS,结果将会大不相同 - 在这种情况下,优化器会倾向于计划 2,因为它通常会更快地返回前几行,但代价是整个查询的效率较低。
The output of EXPLAIN PLAN is a debug output from Oracle's query optimiser. The COST is the final output of the Cost-based optimiser (CBO), the purpose of which is to select which of the many different possible plans should be used to run the query. The CBO calculates a relative Cost for each plan, then picks the plan with the lowest cost.
(Note: in some cases the CBO does not have enough time to evaluate every possible plan; in these cases it just picks the plan with the lowest cost found so far)
In general, one of the biggest contributors to a slow query is the number of rows read to service the query (blocks, to be more precise), so the cost will be based in part on the number of rows the optimiser estimates will need to be read.
For example, lets say you have the following query:
(The
months_of_service
column has a NOT NULL constraint on it and an ordinary index on it.)There are two basic plans the optimiser might choose here:
months_of_service=6
).months_of_service=6
(this results in a set of ROWIDs), then access the table based on the ROWIDs returned.Let's imagine the "employees" table has 1,000,000 (1 million) rows. Let's further imagine that the values for months_of_service range from 1 to 12 and are fairly evenly distributed for some reason.
The cost of Plan 1, which involves a FULL SCAN, will be the cost of reading all the rows in the employees table, which is approximately equal to 1,000,000; but since Oracle will often be able to read the blocks using multi-block reads, the actual cost will be lower (depending on how your database is set up) - e.g. let's imagine the multi-block read count is 10 - the calculated cost of the full scan will be 1,000,000 / 10; Overal cost = 100,000.
The cost of Plan 2, which involves an INDEX RANGE SCAN and a table lookup by ROWID, will be the cost of scanning the index, plus the cost of accessing the table by ROWID. I won't go into how index range scans are costed but let's imagine the cost of the index range scan is 1 per row; we expect to find a match in 1 out of 12 cases, so the cost of the index scan is 1,000,000 / 12 = 83,333; plus the cost of accessing the table (assume 1 block read per access, we can't use multi-block reads here) = 83,333; Overall cost = 166,666.
As you can see, the cost of Plan 1 (full scan) is LESS than the cost of Plan 2 (index scan + access by rowid) - which means the CBO would choose the FULL scan.
If the assumptions made here by the optimiser are true, then in fact Plan 1 will be preferable and much more efficient than Plan 2 - which disproves the myth that FULL scans are "always bad".
The results would be quite different if the optimiser goal was FIRST_ROWS(n) instead of ALL_ROWS - in which case the optimiser would favour Plan 2 because it will often return the first few rows quicker, at the cost of being less efficient for the entire query.
CBO 构建决策树,估计每个查询可用的每个可能执行路径的成本。 成本由实例上设置的 CPU_cost 或 I/O_cost 参数设置。 CBO 会尽可能利用查询将使用的表和索引的现有统计信息来估计成本。 您不应仅根据成本来调整查询。 成本使您能够了解优化器为何执行其操作。 无需成本,您就可以弄清楚为什么优化器选择了它所执行的计划。 较低的成本并不意味着更快的查询。 在某些情况下这是正确的,在某些情况下这是错误的。 成本基于您的表格统计数据,如果它们错误,则成本将是错误的。
调整查询时,您应该查看每个步骤的基数和行数。 它们有道理吗? 优化器假设的基数是否正确? 返回的行是否合理。 如果提供的信息是错误的,那么优化器很可能没有做出正确决策所需的正确信息。 这可能是由于表和索引以及 cpu 统计信息的统计信息过时或丢失。 在调整查询时最好更新统计信息,以充分利用优化器。 了解您的架构在调整时也有很大帮助。 了解优化器何时选择了一个非常糟糕的决策,并通过一个小提示将其指向正确的路径可以节省大量时间。
The CBO builds a decision tree, estimating the costs of each possible execution path available per query. The costs are set by the CPU_cost or I/O_cost parameter set on the instance. And the CBO estimates the costs, as best it can with the existing statistics of the tables and indexes that the query will use. You should not tune your query based on cost alone. Cost allows you to understand WHY the optimizer is doing what it does. Without cost you could figure out why the optimizer chose the plan it did. Lower cost does not mean a faster query. There are cases where this is true and there will be cases where this is wrong. Cost is based on your table stats and if they are wrong the cost is going to be wrong.
When tuning your query, you should take a look at the cardinality and the number of rows of each step. Do they make sense? Is the cardinality the optimizer is assuming correct? Is the rows being return reasonable. If the information present is wrong then its very likely the optimizer doesn't have the proper information it needs to make the right decision. This could be due to stale or missing statistics on the table and index as well as cpu-stats. Its best to have stats updated when tuning a query to get the most out of the optimizer. Knowing your schema is also of great help when tuning. Knowing when the optimizer chose a really bad decision and pointing it in the correct path with a small hint can save a load of time.
在最新的 Oracle 版本中,COST 表示优化器期望查询花费的时间,以单个块读取所需的时间量为单位表示。
因此,如果单个块读取需要 2 毫秒并且成本表示为“250”,则查询预计需要 500 毫秒才能完成。
优化器根据估计的单块和多块读取数量以及计划的 CPU 消耗来计算成本。 后者通过在其他操作之前执行某些操作来尝试避免高 CPU 成本操作,从而在最小化成本方面非常有用。
这就提出了一个问题:优化器如何知道操作需要多长时间。 最近的 Oracle 版本允许收集“系统统计信息”,这绝对不要与表或索引的统计信息混淆。 系统统计数据是对硬件性能的测量,最重要的是:
这些数字可能会根据系统的运行环境而有很大差异,并且可以为“白天OLTP”操作和“夜间批量报告”操作以及“月末报告”(如果您愿意)存储不同组的统计数据。
给定这些统计数据集,可以评估给定查询执行计划在不同操作环境中的成本,这可能会促进有时使用全表扫描或有时使用索引扫描。
成本并不完美,但优化器在每次发布时都能更好地进行自我监控,并且可以反馈实际成本与估计成本的比较,以便为未来做出更好的决策。 这也使得预测变得更加困难。
请注意,成本不一定是挂钟时间,因为并行查询操作会消耗多个线程的总时间。
在旧版本的 Oracle 中,忽略了 CPU 操作的成本,并且根据 init 参数有效地固定了单块和多块读取的相对成本。
In recent Oracle versions the COST represent the amount of time that the optimiser expects the query to take, expressed in units of the amount of time required for a single block read.
So if a single block read takes 2ms and the cost is expressed as "250", the query could be expected to take 500ms to complete.
The optimiser calculates the cost based on the estimated number of single block and multiblock reads, and the CPU consumption of the plan. the latter can be very useful in minimising the cost by performing certain operations before others to try and avoid high CPU cost operations.
This raises the question of how the optimiser knows how long operations take. recent Oracle versions allow the collections of "system statistics", which are definitely not to be confused with statistics on tables or indexes. The system statistics are measurements of the performance of the hardware, mostly importantly:
These numbers can vary greatly according to the operating environment of the system, and different sets of statistics can be stored for "daytime OLTP" operations and "nighttime batch reporting" operations, and for "end of month reporting" if you wish.
Given these sets of statistics, a given query execution plan can be evaluated for cost in different operating environments, which might promote use of full table scans at some times or index scans at others.
The cost is not perfect, but the optimiser gets better at self-monitoring with every release, and can feedback the actual cost in comparison to the estimated cost in order to make better decisions for the future. this also makes it rather more difficult to predict.
Note that the cost is not necessarily wall clock time, as parallel query operations consume a total amount of time across multiple threads.
In older versions of Oracle the cost of CPU operations was ignored, and the relative costs of single and multiblock reads were effectively fixed according to init parameters.
以下是在 Oracle 中使用 EXPLAIN PLAN 的参考:http: //download.oracle.com/docs/cd/B19306_01/server.102/b14211/ex_plan.htm),以及有关此处找到的列的具体信息:http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/ex_plan .htm#i18300
您提到的“FULL”向我表明查询正在执行全表扫描来查找您的数据。 在某些情况下,这是可以的,否则表明索引/查询编写不佳。
一般来说,通过解释计划,您希望确保您的查询正在使用键,这样 Oracle 就可以通过访问尽可能少的行数来找到您正在查找的数据。 最终,有时您对表的架构只能做到这一点。 如果成本仍然太高,您可能必须考虑调整架构布局以更加基于性能。
Here is a reference for using EXPLAIN PLAN with Oracle: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/ex_plan.htm), with specific information about the columns found here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/ex_plan.htm#i18300
Your mention of 'FULL' indicates to me that the query is doing a full-table scan to find your data. This is okay, in certain situations, otherwise an indicator of poor indexing / query writing.
Generally, with explain plans, you want to ensure your query is utilizing keys, thus Oracle can find the data you're looking for with accessing the least number of rows possible. Ultimately, you can sometime only get so far with the architecture of your tables. If the costs remain too high, you may have to think about adjusting the layout of your schema to be more performance based.
FULL 可能指的是全表扫描,这意味着没有使用任何索引。 这通常表明出现了问题,除非查询应该使用表中的所有行。
成本是一个数字,表示不同负载、处理器、内存、磁盘、IO 的总和,数字过高通常是不好的。 当移动到计划的根时,这些数字就会被累加起来,并且应该检查每个分支以找到瓶颈。
您可能还想查询 v$sql 和 v$session 以获取有关 SQL 语句的统计信息,这将包含所有类型的资源、计时和执行的详细指标。
FULL is probably referring to a full table scan, which means that no indexes are in use. This is usually indicating that something is wrong, unless the query is supposed to use all the rows in a table.
Cost is a number that signals the sum of the different loads, processor, memory, disk, IO, and high numbers are typically bad. The numbers are added up when moving to the root of the plan, and each branch should be examined to locate the bottlenecks.
You may also want to query v$sql and v$session to get statistics about SQL statements, and this will have detailed metrics for all kind of resources, timings and executions.