为什么在 SQL Server 中的索引列上执行 top(1) 速度很慢?
我对以下内容感到困惑。我有一个大约有 1000 万行的数据库,并且(在其他索引中)1 列(campaignid_int)是一个索引。
现在我有 700k 行,其中 Campaignid 确实是 3835
对于所有这些行,connectionid 都是相同的。
我只是想找出这个connectionid。
use messaging_db;
SELECT TOP (1) connectionid
FROM outgoing_messages WITH (NOLOCK)
WHERE (campaignid_int = 3835)
现在执行此查询大约需要 30 秒!
我(以我的小数据库知识)希望它会获取任何行,并返回该connectionid
如果我为只有 1 个条目的活动测试相同的查询,它会非常快。所以索引有效。
我将如何解决这个问题以及为什么这不起作用?
编辑:
estimated execution plan:
select (0%) - top (0%) - clustered index scan (100%)
I'm puzzled by the following. I have a DB with around 10 million rows, and (among other indices) on 1 column (campaignid_int) is an index.
Now I have 700k rows where the campaignid is indeed 3835
For all these rows, the connectionid is the same.
I just want to find out this connectionid.
use messaging_db;
SELECT TOP (1) connectionid
FROM outgoing_messages WITH (NOLOCK)
WHERE (campaignid_int = 3835)
Now this query takes approx 30 seconds to perform!
I (with my small db knowledge) would expect that it would take any of the rows, and return me that connectionid
If I test this same query for a campaign which only has 1 entry, it goes really fast. So the index works.
How would I tackle this and why does this not work?
edit:
estimated execution plan:
select (0%) - top (0%) - clustered index scan (100%)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
根据统计信息,您应该明确要求优化器使用您创建的索引而不是聚集索引。
我希望它能解决这个问题。
问候,
恩里克
Due to the statistics, you should explicitly ask the optimizer to use the index you've created instead of the clustered one.
I hope it will solve the issue.
Regards,
Enrique
我最近遇到了同样的问题,而且解决起来非常简单(至少在某些情况下)。
如果您在任何或某些已建立索引的列上添加 ORDER BY 子句,则应该可以解决该问题。这至少为我解决了这个问题。
I recently had the same issue and it's really quite simple to solve (at least in some cases).
If you add an
ORDER BY
-clause on any or some of the columns that's indexed it should be solved. That solved it for me at least.您没有在查询中指定
ORDER BY
子句,因此优化器不会收到关于应从中选择前 1 个的排序顺序的指示。 SQL Server 不会只获取随机行,它会按某些内容对行进行排序并获取前 1 个行,并且它可能会选择按次优的内容进行排序。我建议您添加一个ORDER BY x
子句,其中x
作为该表上的聚集键可能是最快的。这可能无法解决您的问题 - 事实上,我不确定我是否期望从您提供的统计数据中得到它 - 但是(a)它不会造成伤害,并且(b)您将能够解决这个问题出作为一个促成因素。
You aren't specifying an
ORDER BY
clause in your query, so the optimiser is not being instructed as to the sort order it should be selecting the top 1 from. SQL Server won't just take a random row, it will order the rows by something and take the top 1, and it may be choosing to order by something that is sub-optimal. I would suggest that you add anORDER BY x
clause, wherex
being the clustered key on that table will probably be the fastest.This may not solve your problem -- in fact I'm not sure I expect it to from the statistics you've given -- but (a) it won't hurt, and (b) you'll be able to rule this out as a contributing factor.
如果
campaignid_int
列未建立索引,请为其添加索引。这应该会加快查询速度。现在,我假设您需要在返回top(1)
行之前进行全表扫描以查找campaignid_int = 3835
的匹配项(在返回结果之前进行过滤)返回)。编辑:索引已经就位,但由于 SQL Server 执行聚集索引扫描,优化器忽略了该索引。这可能是由于(许多)重复行具有相同的
campaignid_int
值。您应该考虑以不同的方式建立索引或在不同的列上查询以获得您想要的connectionid
。If the
campaignid_int
column is not indexed, add an index to it. That should speed up the query. Right now I presume that you need to do a full table scan to find the matches forcampaignid_int = 3835
before thetop(1)
row is returned (filtering occurs before results are returned).EDIT: An index is already in place, but since SQL Server does a clustered index scan, the optimizer has ignored the index. This is probably due to (many) duplicate rows with the same
campaignid_int
value. You should consider indexing differently or query on a different column to get theconnectionid
you want.索引可能无用的原因有两个:
否则,优化器决定它也可以使用 PK/聚集索引对campaignid_int 进行过滤并获取connectionid,以避免对当前索引中的700k 行进行书签查找。
所以,我建议这个...
The index may be useless for 2 reasons:
Otherwise, the optimiser decides it may as well use the PK/clustered index to both filter on campaignid_int and get connectionid, to avoid a bookmark lookup on 700k rows from the current index.
So, I suggest this...
这并不能回答您的问题,但请尝试使用:
我也看到 top(x) 在某些情况下表现也非常糟糕。我确信它正在进行全表扫描。也许您在该特定列上的索引需要重建?不过,上述内容值得一试。
This doesn't answer your question, but try using:
I've seen top(x) perform very badly in certain situations as well. I'm sure it's doing a full table scan. Perhaps your index on that particular column needs to be rebuilt? The above is worth a try, however.
您的查询无法按您的预期工作,因为 Sql Server 保留有关索引的统计信息,并且在这种特殊情况下知道有很多标识符为 3835 的重复行,因此它认为只执行完整的操作会更有意义索引(或表)扫描。当您测试仅解析为一行的 ID 时,它会按预期使用索引,即执行索引查找(执行计划应验证此猜测)。
可能的解决方案?使索引复合,如果你有任何东西可以组成它,也就是说,例如用消息发送的日期组成它(如果我正确理解你的情况),然后从列表中选择具有指定 id 的前 1 个条目截止日期。虽然我不确定这是否会更好(一方面,复合索引占用更多空间) - 只是猜测。
编辑:我刚刚尝试了通过添加日期列来制作索引复合的建议。如果您执行此操作并在查询中指定
order by date
,则会按预期执行索引查找。Your query does not work as you expect, because Sql Server keeps statistics about your index and in this particular case knows that there are a lot of duplicate rows with the identifier 3835, hence it figures that it would make more sense to just do a full index (or table) scan. When you test for an ID which resolves to only one row, it uses the index as expected, i.e. performs an index seek (the execution plan should verify this guess).
Possible solutions ? Make the index composite, if you have anything to compose it with, that is, e.g. compose it with the date the message was sent (if I understand your case correctly) and then select the top 1 entry from the list with the specified id ordered by the date. Though I'm not sure whether this would be better (for one, a composite index takes up more space) - just a guess.
EDIT: I just tried out the suggestion of making the index composite by adding a date column. If you do that and specify
order by date
in your query, an index seek is performed as expected.抱歉,还不能发表评论,但这里的答案是,当 SQL Server 听到“Top 1”时,它不会理解人类相当于“给我找到的第一个”。 SQL Server 并没有像预期的那样“给我任何行”,而是获取所有找到的行中的第一行。
它知道的唯一时间是先获取所有行,然后丢弃其余行。非常彻底,但就你而言并不是很快。
正如其他人所说,主要问题是您的统计数据和索引的选择性。如果表中有另一个唯一字段(例如标识列),请首先尝试在 Campaignid_int 上建立组合索引,然后再尝试在唯一列上建立组合索引。由于您只查询campaignid_int,它必须是键的第一部分。
听起来值得一试,因为该索引应该具有更高的选择性,因此优化器可以比执行索引爬网更好地使用它。
Sorry, can't comment yet but the answer here is that SQL server is not going to understand the human equivalent of "Bring me the first one you find" when it hears "Top 1". Instead of the expected "Give me any row" SQL Server goes and fetches the first of all found rows.
Only time it knows that is after fetching all rows first, then discarding the rest. Very thorough but in your case not really fast.
Main issue as other said are your statistics and selectivity of your index. If you have another unique field in your table (like an identity column) then try an combined index on campaignid_int first, unique column second. As you only query on campaignid_int it has to be the first part of the key.
Sounds worth a try as this index should have a higher selectivity thus the optimizer can use this better than doing an index crawl.