SQL Server 在什么基数下切换到索引扫描(相对于查找)
假设一个表包含足够的信息来保证索引查找,那么 SQL Server(或 PostgreSQL)将选择什么基数进行索引扫描?
我问这个的原因是我之前发布了一个问题(链接),其中两个查询以相同的速度执行,但一个查询没有尝试在已处理的列上使用索引。在 SQL Server 建议我放置一个覆盖索引来包含正在查询的列(它建议这两个查询都这样做)后,我开始寻找为什么它会这样的原因一个奇怪的建议。
我尝试制作索引覆盖和复合,但两者同时执行(我们谈论的是 300 万行)。
最后我得出结论是因为数据的基数超高。每一行都是独一无二的。我推断这导致 SQL Server 选择索引扫描。然而,查询指出“WHERE Col1 > ? AND Col2 < ?”,所以这有点令人困惑。
我的问题是:
- RDBMS 在什么基数下总是选择索引扫描?
- 谁能解释为什么当 WHERE 语句表明这有意义时 SQL Server 不使用索引?
我已附上执行计划。
Assuming that a table contains sufficient information to warrant an index seek, at what cardinality will SQL Server (or PostgreSQL) opt for an index scan?
The reason I ask this is I previously posted a question (link) in which two queries performed at the same speed, yet one didn't attempt to use the index on the processed columns. After SQL Server suggested I put a covering index that included the columns being queried (it suggested this for both queries), I started looking for reasons as to why it would make such a strange suggestion.
I experimented with making the indexes covering and composite, but both executed in the same time (we're talking 3 million rows).
Finally I concluded it was because of the ultra-high cardinality of the data. Every row is unique. I am deducing this caused SQL server to choose an index scan. However, the query stated "WHERE Col1 > ? AND Col2 < ?", so this is a little confusing.
My questions are:
- At what cardinality will a RDBMS always opt for an index scan?
- Can anyone explain why SQL Server would not use the index when the WHERE statement would indicate this would make sense?
I have attached the execution plan.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
就 SQL Server 而言,这被称为临界点,Kimberley 的博客文章值得一读。 http://www.sqlskills.com/BLOGS/KIMBERLY/category /The-Tipping-Point.aspx
临界点是表中总页数的 25%-33%(以行表示),例如 10k 数据页的临界点为 2500- 3333 行。按照指导方针,这非常好,并且与您将得到的一样好 - 请记住查询计划引擎是一个黑匣子,虽然它会给您一个查询计划,但它只说明它决定了什么,而不是为什么。
不过,就倾斜覆盖索引而言,这实际上并不是很容易,即使 100% 的数据被选择,覆盖索引在大多数情况下仍然会进行过度扫描。
这是有道理的,如果您认为成本优化器没有为索引页层次结构分配任何实际成本,则只会增加对索引叶页的访问成本。此时,扫描或查找 100% 的覆盖索引的成本是相同的。
我从自己的实验中发现(http:// sqlfascination.com/2009/11/07/can-a-covering-nc-index-be-tipped )使用 Between 子句会导致它扫描,但其他 where 子句不会 - 据我所知这纯粹取决于通过查询引擎的路线。
In terms of SQL Server, this has been referred to as the tipping point, of which Kimberley's blog post is a good read on it. http://www.sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx
The tipping point is a guideline of 25%-33% of the total number of pages within the table, expressed as rows, e.g. 10k data pages would give a tipping point of 2500-3333 rows. As guidelines go this is pretty good, and as good as you will get - remember the query plan engine is a black box, and whilst it will give you a query plan, it only says what it decided, not why.
In terms of tipping a covering index though, that is not actually very easy, even with 100% of the data being selected a covering index will still seek over scan in the majority of cases.
That makes sense, if you consider that the cost optimizer doesn't assign any real cost to the index pages hierarchy, any only costs up the access to the leaf pages of the index. At that point, scanning or seeking 100% of a covering index is costed the same.
I found from my own experimentation (http://sqlfascination.com/2009/11/07/can-a-covering-nc-index-be-tipped ) using a between clause would cause it to scan, but other where clauses would not - from what I could tell it was purely down to the route through the query engine.
在 PostgreSQL 中,这通常不是一个好问题,因为实际的计划选择更加复杂。它取决于表大小、内存设置和查询的其他部分。仅当您选择很少的行时,通常才会获得普通索引扫描。除此之外,在简单的实验中,您将获得高达 40% 的选择性的位图索引扫描。
In PostgreSQL, this is usually not a good question to ask because the actual plan selection is more complicated. It depends on table size, memory settings, and other parts of the query. You will usually get a plain index scan only if you are selecting very few rows. Above that, you will get a bitmap index scan up to say 40% selectivity in simple experiments.