数据库的执行计划
问题1:
当我们执行查询时,每次执行查询时执行计划都会改变吗?
如果是,有任何性能影响吗?
如果不是,那么如果我们更改表中的某些内容,即添加索引,数据库如何知道可以使用某些内容来更改执行计划以加快执行速度?
问题2:
但是执行连接查询时的一般执行顺序是什么,特别是如果有很多连接(outer、inner、natural,如果有很多outer)。
Question 1:
When we execute a query, does the execution plan change for each and every time when executing the query?
If yes, any performance hit?
If no, then if we change something in the table, i.e adding an index, how does the databse know that there is something it can use to change the execution plan for faster execution?
QUestion 2:
But what is the general order of execution while executing a join query, especially if there are lot of joins ( outer, inner , natural , if many outer).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
缓存中最多有两个计划(一个是并行的,一个是非并行的)。然后该计划与每个用户的执行上下文一起使用。 更多信息在我的答案中
SQL 是声明性的。这意味着你告诉引擎你想要什么,优化器会制定出最佳计划(在合理范围内,可能需要两周时间才能制定出最佳计划)。这就是为什么您可以通过多种不同的方式重写查询以获得相同的答案。
与 RDBMS 的任何规则一样,也有例外。对于复杂的查询,优化器不会经历每个排列,因此 JOIN 顺序可能很重要:这取决于优化器何时决定它已经足够了......
You have at most two plans in cache (one parallel, one non-parallel). Then the plan is used with an Execution Context per user. More info in my answer here
SQL is declarative. This means you tell the engine what you want and the optimiser works out the best plan (within reason, it might take 2 weeks to work out the best one). This is why you can rewrite queries many different ways to get the same answer.
Like any rules about RDBMS, there are exceptions. For complex queries, the optimiser will not go through every permutation so the JOIN order can matter: it depends when the optimiser decides it's had enough...
(假设这里是 SQL Server,您没有指定...)
执行计划被缓存,并且根据您参数化查询的程度,可以重用。
当您修改基础表或索引时,SQL Server 知道这些内容与缓存的执行计划的修改日期,并且可以重新评估新条件的查询。更新统计数据时也是如此……有时实际数据驱动计划,而不仅仅是表/索引设计。
(Assuming SQL Server here, you didn't specify...)
Execution plans are cached, and to the degree you parameterize your queries, can be reused.
When you modify the underlying table or indexes, SQL Server knows the modification date of these things vs. the cached execution plan, and can re-evaluate the query for the new conditions. Same thing when statistics are updated... sometimes the actual data drives the plan, not just the table/index design.
连接不是按照基于内部与外部的顺序完成的,而是基于优化器认为将导致查询最快执行的顺序。数据库之间的详细信息会有所不同。但基本上,查询优化器尝试优化索引的使用。
假设您有以下查询:
现在假设您在 b.b_id 上有唯一索引,但在 a.some_number 上没有索引。
然后,查询优化器有两个选择:它可以对 b 执行全文件顺序读取,然后对每个 b 对 a 执行全文件顺序读取,以查找 b_id 和 some_number=42 上的匹配项。即读取a^b记录。或者它可以对 a 执行全文件顺序读取来查找 some_number=42,然后对于每个 a,它可以使用索引快速从 b 中查找具有匹配 b_id 的记录。即读取a*2条记录。显然第二个计划要好得多,所以它会选择这个。
随着添加更多表,计算变得更加复杂,但原理是相同的。使用在其他表中找到的值进行快速索引读取的连接将在稍后读取其他表后完成。无论如何都必须按顺序读取的表,或者读取基于常量而不是其他记录中的值的表,通常首先读取。
Joins are not done in an order based on inner vs outer, but rather based on what the optimizer thinks will result in the query being executed most quickly. The details will vary between databases. But basically, the query optimizer tries to optimize the use of indexes.
Suppose you had this query:
Now suppose that you have a unique index on b.b_id but no index on a.some_number.
The query optimizer then has two choices: It could do a full-file sequential read on b, and then for each b do a full-file sequential read on a looking for a match on b_id and some_number=42. That is read a^b records. Or it could do a full-file sequential read on a looking for some_number=42, then for each a it could use the index to quickly find the record from b with matching b_id. That is, read a*2 records. Well obviously the second plan is much better, so that's what it will choose.
As you add more tables the calculation becomes more complicated, but the principle is the same. Joins that result in a quick index read using values found in other tables will be done later, after the other tables have been read. Tables that must be read sequentially no matter what, or where the read is based on constants rather than values from other records, are generally read first.