SQL查询执行缓慢(对于某些参数值)

发布于 2024-09-08 05:55:34 字数 1557 浏览 7 评论 0原文

我有一个包含多个表的 SQL Server 2005 数据库。其中一个表用于存储多个设备的时间戳和消息计数器,并具有以下列:

CREATE TABLE [dbo].[Timestamps] (
[Id] [uniqueidentifier] NOT NULL,
[MessageCounter] [bigint] NULL,
[TimeReceived] [bigint] NULL,
[DeviceTime] [bigint] NULL,
[DeviceId] [int] NULL
)

Id 是唯一主键 (Guid.Comb),我在两个 上都有索引DeviceIdMessageCounter 列。

我想要做的是找到特定设备的最后插入的行(具有最大 MessageCounter 的行)。

奇怪的是,查询设备号。 4(以及除 1 号之外的所有其他设备)几乎立即返回:

select top 1 * 
   from "Timestamps"
   where DeviceId = 4
   order by MessageCounter desc

但对设备号进行相同的查询。 1 需要很长时间才能完成:

select top 1 * 
   from "Timestamps"
   where DeviceId = 1 /* this is the only line changed */
   order by MessageCounter desc

最奇怪的是设备 1 的行数比设备 4 少得多:

select count(*) from "Timestamps" where DeviceId = 4
(returns 1,839,210)

select count(*) from "Timestamps" where DeviceId = 1
(returns 323,276).

有人知道我可能做错了什么吗?

[编辑]

从两个查询的执行计划中,可以清楚地看到设备 1(下图)在索引扫描中创建了更多的行:

设备 4(上)和设备 1(下)的执行计划 http://img295.imageshack.us/img295/5784/execplans。区别

在于,当我将索引扫描节点悬停在执行计划图上时:

Device 4 Actual Number of Rows: 1

Device 1 Actual Number of Rows: approx. 6,500,000

6,500,000 行是一个非常奇怪的数字,因为我的 select count(*) 查询为设备 1 返回大约 300,000 行!

I have a SQL Server 2005 database with several tables. One of the tables is used to store timestamps and message counters for several devices, and has the following columns:

CREATE TABLE [dbo].[Timestamps] (
[Id] [uniqueidentifier] NOT NULL,
[MessageCounter] [bigint] NULL,
[TimeReceived] [bigint] NULL,
[DeviceTime] [bigint] NULL,
[DeviceId] [int] NULL
)

Id is the unique primary key (Guid.Comb), and I have indexes on both DeviceId and MessageCounter columns.

What I want to do is find the last inserted row (the row with the largest MessageCounter) for a certain device.

The thing that is strange is that a query for device no. 4 (and all other devices except no.1) returns almost instantaneously:

select top 1 * 
   from "Timestamps"
   where DeviceId = 4
   order by MessageCounter desc

but the same query for device no. 1 takes forever to complete:

select top 1 * 
   from "Timestamps"
   where DeviceId = 1 /* this is the only line changed */
   order by MessageCounter desc

The strangest thing is that device 1 has much less rows than device 4:

select count(*) from "Timestamps" where DeviceId = 4
(returns 1,839,210)

select count(*) from "Timestamps" where DeviceId = 1
(returns 323,276).

Does anyone have a clue what I could be doing wrong?

[Edit]

From the execution plans for both queries, it is clearly visible that Device 1 (lower diagram) creates a much larger number of rows in Index scan:

Execution plans for device 4 (upper) and device 1 (lower) http://img295.imageshack.us/img295/5784/execplans.png

The difference is when I hover the Index Scan nodes on execution plan diagrams:

Device 4 Actual Number of Rows: 1

Device 1 Actual Number of Rows: approx. 6,500,000

6,500,000 rows is a very strange number, since my select count(*) query returns around 300,000 rows for device 1!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

温柔少女心 2024-09-15 05:55:34

尝试在 (DeviceId, MessageCounter DESC) 上创建索引。

另外,请尝试以下查询:

select * 
   from "Timestamps"
   where DeviceId = 1
   and MessageCounter = (SELECT MAX(MessageCounter) FROM "Timestamps" WHERE DeviceID = 1)

只是猜测:性能差异可能是因为 DeviceId = 1 分布在比 DeviceId = 4 更多的页面上。通过排序,我怀疑您正在挖掘所有匹配的页面,即使您最终只选择了顶行。

Try creating an index on (DeviceId, MessageCounter DESC).

Also, try this query:

select * 
   from "Timestamps"
   where DeviceId = 1
   and MessageCounter = (SELECT MAX(MessageCounter) FROM "Timestamps" WHERE DeviceID = 1)

Just guessing: The performance difference might be because DeviceId = 1 is spread across more pages than DeviceId = 4. By sorting, I suspect you are dredging up all matching pages, even if you end up selecting only the top row.

少女七分熟 2024-09-15 05:55:34

您确定统计数据是最新的吗?使用更新统计信息

UPDATE STATISTICS dbo.Timestamps

您如何运行查询?如果通过存储过程,可能您遇到 问题参数嗅探

Are you sure the statistics are up to date? Use UPDATE STATISTICS:

UPDATE STATISTICS dbo.Timestamps

How are you running the query? If via a stored procedure, maybe you're having an issue with parameter sniffing?

夜未央樱花落 2024-09-15 05:55:34

执行计划图表不是很有帮助,因为它们没有显示使用哪个索引。

最有用的信息来自以下查询,

select DeviceId, max(MessageCounter) from "Timestamps" group by DeviceId

我假设设备 2 到 4 的消息计数器是相对较高的数字。 MessageCounter 是一个相对较低的数字。

在这种情况下,SQL Server 如何执行查询:

服务器从高到低读取 MessageCounter 索引。对于每一行,服务器都会对聚集索引进行嵌套查找以比较设备 ID。

对于设备 2-4,这很快就会结束,因为服务器在设备 2-4 的 MessageCounter 索引中找到了一行。对于设备 1,服务器需要超过 600 万次查找操作,然后服务器才能找到设备 1 的第一行。

读取 deviceid 索引并查找聚集索引会更快。这应该在 323k 次搜索后停止。甚至很糟糕。

您应该有一个包含设备 ID 和 MessageCounter 的索引(正如 Marcelo Cantos 指出的那样)。

The execution plans diagramms are not very helpfull because they do not show which index are used.

The most helpfull informations comes from the following query

select DeviceId, max(MessageCounter) from "Timestamps" group by DeviceId

I assume the MessageCounter for Devices 2 to 4 are relative high numbers. The MessageCounter is a relative low number.

How does the SQL server executes the query in that case:

The server reads the MessageCounter index from high to low numbers. For every row the server make a nested seek into custered index to compare the device id.

For devices 2-4 this ends very soon, because the server finds a row in the MessageCounter Index for device 2-4. For device 1 the server needs more than 6 millions seek operations, before the server finds the first row for device 1.

It would be faster to read the deviceid index and seek into custered index. This should stops after 323k seeks. Even bad.

You should have an index that contains both the device ids and MessageCounter (as Marcelo Cantos pointed out).

夜深人未静 2024-09-15 05:55:34

我认为这一定会发生,因为如果您按 MessageCounter 降序排列记录,则必须先遍历 6,500,000 条记录,然后才能找到第一个带有 DeviceId=4 的记录,而对于其他 DeviceId 来说,有一个更好的分布,

我认为 DeviceId=4 谓词直到执行计划上的 Filter 运算符才发挥作用。

DeviceId, MessageCounter 上的复合索引可以解决此问题。但是,具有 DeviceId=4 的设备是否是不再记录新数据的旧设备?如果是这样,您也许可以将 DeviceId=4 记录提取到自己的表中并使用分区视图,以便该设备上的查询不会扫描大量不相关的记录。

下面已更正

另外,选择 Guid.Comb 作为聚集索引的原因是什么?我认为 DeviceId, MessageCounter 上的聚集索引在碎片和避免热点方面具有类似的特征,但更有用。

I presume that this must be happening because if you order the records by MessageCounter descending there are 6,500,000 that it has to plough through before it finds the first one with DeviceId=4 whereas for the other DeviceId's there is a much better spread

I presume that the DeviceId=4 predicate doesn't come into play until the Filter operator on the execution plan.

A composite index on DeviceId, MessageCounter would resolve this. But is the Device with DeviceId=4 a legacy device for which new data is no longer being recorded? If so you may be able to get away with extracting the DeviceId=4 records into a table of their own and using a partitioned View so that queries on that device don't scan a load of unrelated records.

Below Corrected

Also What is the reason for choosing Guid.Comb as a clustered index? I presume a clustered index on DeviceId, MessageCounter would have similar characteristics in terms of fragmentation and avoiding hot spots but be more useful.

琴流音 2024-09-15 05:55:34

我的第一个想法是,这可能是由于参数嗅探 - 本质上 SQL Server 在第一次运行查询时会提出一个计划,但该查询不能代表典型的工作负载。请参阅 http://www.sqlshare.com/solve -parameter-sniffing-by-using-local-variables_531.aspx

关于统计的建议很好,但我怀疑您需要查看这两个查询的查询计划。您可以在查询分析器中执行此操作 - 它是“执行”按钮右侧的三个按钮。尝试看看两个查询的计划有什么不同......

My first thought was that this might be due to parameter sniffing - essentially SQL Server comes up with a plan for the first time a query is run, but that query was unrepresentative of the typical workload. See http://www.sqlshare.com/solve-parameter-sniffing-by-using-local-variables_531.aspx

The advice about statistics is good, but I suspect you'll need to have a look at the query plans for both these queries. You can do this in Query Analyser - it's about three buttons to the right of the Execute button. Try to see what is different between the plans for both queries...

东京女 2024-09-15 05:55:34

这些查询是否像您发布的那样发送到 SQL Server

select top 1 * 
   from "Timestamps"
   where DeviceId = 4
   order by MessageCounter desc

,或者 NHibernate 是否使用参数化查询? (where deviceid = @deviceid 或类似的东西)?

这可能会解释它 - SQL Server 获取 DeviceId = 4 的参数化查询,提出一个适用于该参数值的执行计划,但在下一次执行时,对于 DeviceId = 1,它会出错,并且以某种方式从第一个查询对于第二种情况不再是最佳的。

您可以尝试以相反的顺序执行这两个查询吗?首先使用 DeviceId=1,然后使用 DeviceId=4 - 这会给你相同的结果吗?

Are those queries sent to SQL Server exactly like you posted them

select top 1 * 
   from "Timestamps"
   where DeviceId = 4
   order by MessageCounter desc

or did NHibernate use parametrized queries? (where deviceid = @deviceid or something like that)??

That might explain it - SQL Server gets the parametrized query for DeviceId = 4, comes up with an execution plan that works for that parameter value, but then on the next execution, for DeviceId = 1, it stumbles and somehow the execution plan from the first query isn't optimal for that second case anymore.

Can you try to execute those two queries in the reversed order?? First with DeviceId=1 and then with DeviceId=4 - does that give you the same results??

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文