为什么在 SQL Server 中的索引列上执行 top(1) 速度很慢?

发布于 2024-08-25 06:42:29 字数 607 浏览 2 评论 0原文

我对以下内容感到困惑。我有一个大约有 1000 万行的数据库,并且(在其他索引中)1 列(campaignid_int)是一个索引。

现在我有 700k 行,其中 Campaignid 确实是 3835

对于所有这些行,connectionid 都是相同的。

我只是想找出这个connectionid。

 use messaging_db;
 SELECT     TOP (1) connectionid
 FROM         outgoing_messages WITH (NOLOCK)
 WHERE     (campaignid_int = 3835)

现在执行此查询大约需要 30 秒!

我(以我的小数据库知识)希望它会获取任何行,并返回该connectionid

如果我为只有 1 个条目的活动测试相同的查询,它会非常快。所以索引有效。

我将如何解决这个问题以及为什么这不起作用?

编辑:

estimated execution plan:

select (0%) - top (0%) - clustered index scan (100%)

I'm puzzled by the following. I have a DB with around 10 million rows, and (among other indices) on 1 column (campaignid_int) is an index.

Now I have 700k rows where the campaignid is indeed 3835

For all these rows, the connectionid is the same.

I just want to find out this connectionid.

 use messaging_db;
 SELECT     TOP (1) connectionid
 FROM         outgoing_messages WITH (NOLOCK)
 WHERE     (campaignid_int = 3835)

Now this query takes approx 30 seconds to perform!

I (with my small db knowledge) would expect that it would take any of the rows, and return me that connectionid

If I test this same query for a campaign which only has 1 entry, it goes really fast. So the index works.

How would I tackle this and why does this not work?

edit:

estimated execution plan:

select (0%) - top (0%) - clustered index scan (100%)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

蒲公英的约定 2024-09-01 06:42:29

根据统计信息,您应该明确要求优化器使用您创建的索引而不是聚集索引。

SELECT  TOP (1) connectionid
FROM    outgoing_messages WITH (NOLOCK, index(idx_connectionid))
WHERE  (campaignid_int = 3835)

我希望它能解决这个问题。

问候,
恩里克

Due to the statistics, you should explicitly ask the optimizer to use the index you've created instead of the clustered one.

SELECT  TOP (1) connectionid
FROM    outgoing_messages WITH (NOLOCK, index(idx_connectionid))
WHERE  (campaignid_int = 3835)

I hope it will solve the issue.

Regards,
Enrique

栩栩如生 2024-09-01 06:42:29

我最近遇到了同样的问题,而且解决起来非常简单(至少在某些情况下)。

如果您在任何或某些已建立索引的列上添加 ORDER BY 子句,则应该可以解决该问题。这至少为我解决了这个问题。

I recently had the same issue and it's really quite simple to solve (at least in some cases).

If you add an ORDER BY-clause on any or some of the columns that's indexed it should be solved. That solved it for me at least.

扛起拖把扫天下 2024-09-01 06:42:29

您没有在查询中指定 ORDER BY 子句,因此优化器不会收到关于应从中选择前 1 个的排序顺序的指示。 SQL Server 不会只获取随机行,它会按某些内容对行进行排序并获取前 1 个行,并且它可能会选择按次优的内容进行排序。我建议您添加一个 ORDER BY x 子句,其中 x 作为该表上的聚集键可能是最快的。

这可能无法解决您的问题 - 事实上,我不确定我是否期望从您提供的统计数据中得到它 - 但是(a)它不会造成伤害,并且(b)您将能够解决这个问题出作为一个促成因素。

You aren't specifying an ORDER BY clause in your query, so the optimiser is not being instructed as to the sort order it should be selecting the top 1 from. SQL Server won't just take a random row, it will order the rows by something and take the top 1, and it may be choosing to order by something that is sub-optimal. I would suggest that you add an ORDER BY x clause, where x being the clustered key on that table will probably be the fastest.

This may not solve your problem -- in fact I'm not sure I expect it to from the statistics you've given -- but (a) it won't hurt, and (b) you'll be able to rule this out as a contributing factor.

假装不在乎 2024-09-01 06:42:29

如果 campaignid_int 列未建立索引,请为其添加索引。这应该会加快查询速度。现在,我假设您需要在返回 top(1) 行之前进行全表扫描以查找 campaignid_int = 3835 的匹配项(在返回结果之前进行过滤)返回)。

编辑:索引已经就位,但由于 SQL Server 执行聚集索引扫描,优化器忽略了该索引。这可能是由于(许多)重复行具有相同的 campaignid_int 值。您应该考虑以不同的方式建立索引或在不同的列上查询以获得您想要的connectionid

If the campaignid_int column is not indexed, add an index to it. That should speed up the query. Right now I presume that you need to do a full table scan to find the matches for campaignid_int = 3835 before the top(1) row is returned (filtering occurs before results are returned).

EDIT: An index is already in place, but since SQL Server does a clustered index scan, the optimizer has ignored the index. This is probably due to (many) duplicate rows with the same campaignid_int value. You should consider indexing differently or query on a different column to get the connectionid you want.

若言繁花未落 2024-09-01 06:42:29

索引可能无用的原因有两个:

  • 1000 万中的 700k 可能没有足够的选择性
  • 和/或
  • 需要包含 connectionid,因此整个查询只能使用一个索引

否则,优化器决定它也可以使用 PK/聚集索引对campaignid_int 进行过滤并获取connectionid,以避免对当前索引中的700k 行进行书签查找。

所以,我建议这个...

CREATE NONCLUSTERED INDEX IX_Foo ON MyTable (campaignid_int) INCLUDE (connectionid)

The index may be useless for 2 reasons:

  • 700k in 10 million may be not selective enough
  • and /or
  • connectionid needs included so the entire query can used only an index

Otherwise, the optimiser decides it may as well use the PK/clustered index to both filter on campaignid_int and get connectionid, to avoid a bookmark lookup on 700k rows from the current index.

So, I suggest this...

CREATE NONCLUSTERED INDEX IX_Foo ON MyTable (campaignid_int) INCLUDE (connectionid)
樱娆 2024-09-01 06:42:29

这并不能回答您的问题,但请尝试使用:

SET ROWCOUNT 1
SELECT     connectionid
 FROM         outgoing_messages WITH (NOLOCK)
 WHERE     (campaignid_int = 3835)

我也看到 top(x) 在某些情况下表现也非常糟糕。我确信它正在进行全表扫描。也许您在该特定列上的索引需要重建?不过,上述内容值得一试。

This doesn't answer your question, but try using:

SET ROWCOUNT 1
SELECT     connectionid
 FROM         outgoing_messages WITH (NOLOCK)
 WHERE     (campaignid_int = 3835)

I've seen top(x) perform very badly in certain situations as well. I'm sure it's doing a full table scan. Perhaps your index on that particular column needs to be rebuilt? The above is worth a try, however.

长不大的小祸害 2024-09-01 06:42:29

您的查询无法按您的预期工作,因为 Sql Server 保留有关索引的统计信息,并且在这种特殊情况下知道有很多标识符为 3835 的重复行,因此它认为只执行完整的操作会更有意义索引(或表)扫描。当您测试仅解析为一行的 ID 时,它会按预期使用索引,即执行索引查找(执行计划应验证此猜测)。

可能的解决方案?使索引复合,如果你有任何东西可以组成它,也就是说,例如用消息发送的日期组成它(如果我正确理解你的情况),然后从列表中选择具有指定 id 的前 1 个条目截止日期。虽然我不确定这是否会更好(一方面,复合索引占用更多空间) - 只是猜测。

编辑:我刚刚尝试了通过添加日期列来制作索引复合的建议。如果您执行此操作并在查询中指定order by date,则会按预期执行索引查找。

Your query does not work as you expect, because Sql Server keeps statistics about your index and in this particular case knows that there are a lot of duplicate rows with the identifier 3835, hence it figures that it would make more sense to just do a full index (or table) scan. When you test for an ID which resolves to only one row, it uses the index as expected, i.e. performs an index seek (the execution plan should verify this guess).

Possible solutions ? Make the index composite, if you have anything to compose it with, that is, e.g. compose it with the date the message was sent (if I understand your case correctly) and then select the top 1 entry from the list with the specified id ordered by the date. Though I'm not sure whether this would be better (for one, a composite index takes up more space) - just a guess.

EDIT: I just tried out the suggestion of making the index composite by adding a date column. If you do that and specify order by date in your query, an index seek is performed as expected.

骄兵必败 2024-09-01 06:42:29

但由于我指定了“top(1)”
意思是:给我任意行。为什么会这样
首先爬行 700k 行
退回一个? – 赖尼尔 30 分钟前

抱歉,还不能发表评论,但这里的答案是,当 SQL Server 听到“Top 1”时,它不会理解人类相当于“给我找到的第一个”。 SQL Server 并没有像预期的那样“给我任何行”,而是获取所有找到的行中的第一行。
它知道的唯一时间是先获取所有行,然后丢弃其余行。非常彻底,但就你而言并不是很快。

正如其他人所说,主要问题是您的统计数据和索引的选择性。如果表中有另一个唯一字段(例如标识列),请首先尝试在 Campaignid_int 上建立组合索引,然后再尝试在唯一列上建立组合索引。由于您只查询campaignid_int,它必须是键的第一部分。
听起来值得一试,因为该索引应该具有更高的选择性,因此优化器可以比执行索引爬网更好地使用它。

but since I'm specifying 'top(1)' it
means: give me any row. Why would it
first crawl through the 700k rows just
to return one? – reinier 30 mins ago

Sorry, can't comment yet but the answer here is that SQL server is not going to understand the human equivalent of "Bring me the first one you find" when it hears "Top 1". Instead of the expected "Give me any row" SQL Server goes and fetches the first of all found rows.
Only time it knows that is after fetching all rows first, then discarding the rest. Very thorough but in your case not really fast.

Main issue as other said are your statistics and selectivity of your index. If you have another unique field in your table (like an identity column) then try an combined index on campaignid_int first, unique column second. As you only query on campaignid_int it has to be the first part of the key.
Sounds worth a try as this index should have a higher selectivity thus the optimizer can use this better than doing an index crawl.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文