当前位置：文江博客话题详情

azure sql-server azure-sql-database

SQL Azure 查询性能 - 即使经过调整的查询也非常慢

发布于 2024-11-26 19:40:40 字数 3584 浏览 0 评论 0 原文

下面是一个依赖两个非聚集索引的基本查询：

SELECT cc.categoryid, count(*) from company c
INNER JOIN companycategory cc on cc.companyid = c.id
WHERE c.placeid like 'ca_%'
GROUP BY cc.categoryid order by count(*) desc

当完全相同的数据库托管在 SQL Server 2008 上时，几乎在任何硬件上，都会返回 << 500 毫秒即使清除了缓存缓冲区：

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

...这仍然在传统 SQL 上约 1 秒内返回。

在 Azure 上，每次返回大约需要 3.5 秒。

一些文章似乎表明人们普遍对 SQL Azure 中的查询性能感到满意。然而，这是一个基本场景，其中“明显”的调整已经用尽，并且没有网络延迟问题可言。只是在处理大型表时真的很慢（companycategroy 有 120 万条记录，places 有 7500 条记录）。

数据库总大小不超过4GB。选择“Web”版本与“企业”版本似乎也没有太大区别。

我缺少什么？

这只是一个基本示例，随着更复杂的查询，它只会变得更糟，所有这些都是经过审查、调整并在本地表现良好。

这是执行计划：

  |--Sort(ORDER BY:([Expr1004] DESC))
       |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))
            |--Hash Match(Aggregate, HASH:([cc].[CategoryId]), RESIDUAL:([XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId] = [XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId]) DEFINE:([Expr1007]=COUNT(*)))
                 |--Hash Match(Inner Join, HASH:([c].[Id])=([cc].[CompanyId]))
                      |--Index Scan(OBJECT:([XX].[dbo].[Company].[IX_Company_PlaceId] AS [c]),  WHERE:([XX].[dbo].[Company].[PlaceId] as [c].[PlaceId] like N'ca_%'))
                      |--Index Scan(OBJECT:([XX].[dbo].[CompanyCategory].[IX_CompanyCategory_CompanyId] AS [cc]))

这是统计数据：

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 14 ms, elapsed time = 14 ms.

(789 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CompanyCategory'. Scan count 1, logical reads 5183, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Company'. Scan count 1, logical reads 8710, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 3328 ms,  elapsed time = 3299 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

索引定义如下：

CREATE NONCLUSTERED INDEX [IX_Company_PlaceId] ON [dbo].[Company] 
(
    [PlaceId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

CREATE NONCLUSTERED INDEX [IX_CompanyCategory_CompanyId] ON [dbo].[CompanyCategory] 
(
    [CompanyId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

ALTER TABLE [dbo].[Company] ADD  CONSTRAINT [PK_Company_Id] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

原文

Here's a basic query that relies on two non-clustered indexes:

SELECT cc.categoryid, count(*) from company c
INNER JOIN companycategory cc on cc.companyid = c.id
WHERE c.placeid like 'ca_%'
GROUP BY cc.categoryid order by count(*) desc

When the exact same database is hosted on SQL Server 2008, on virtually any hardware, this returns < 500 ms. Even with the cache buffers cleared:

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

... this still returns in ~1 second on traditional SQL.

On Azure, it takes approximately 3.5 seconds to return each time.

Some articles out there seem to suggest that people are generally happy with query performance in SQL Azure. And yet here's a basic scenario where 'obvious' tuning has been exhausted and there's no network latency issues to speak of. It's just really slow when working w/ large tables (companycategroy has 1.2M records, places has 7.5K).

The total database size is no more than 4GB. Selecting 'Web' edition vs. 'Enterprise' edition doesn't seem to make much of a difference either.

What am I missing?

This is only a basic example, it only gets worse with more sophisticated queries, all of have been reviewed, tuned, and perform well on-premise.

Here's the execution plan:

  |--Sort(ORDER BY:([Expr1004] DESC))
       |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))
            |--Hash Match(Aggregate, HASH:([cc].[CategoryId]), RESIDUAL:([XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId] = [XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId]) DEFINE:([Expr1007]=COUNT(*)))
                 |--Hash Match(Inner Join, HASH:([c].[Id])=([cc].[CompanyId]))
                      |--Index Scan(OBJECT:([XX].[dbo].[Company].[IX_Company_PlaceId] AS [c]),  WHERE:([XX].[dbo].[Company].[PlaceId] as [c].[PlaceId] like N'ca_%'))
                      |--Index Scan(OBJECT:([XX].[dbo].[CompanyCategory].[IX_CompanyCategory_CompanyId] AS [cc]))

And here are the stats:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 14 ms, elapsed time = 14 ms.

(789 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CompanyCategory'. Scan count 1, logical reads 5183, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Company'. Scan count 1, logical reads 8710, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 3328 ms,  elapsed time = 3299 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Index definitions are as follows:

CREATE NONCLUSTERED INDEX [IX_Company_PlaceId] ON [dbo].[Company] 
(
    [PlaceId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

CREATE NONCLUSTERED INDEX [IX_CompanyCategory_CompanyId] ON [dbo].[CompanyCategory] 
(
    [CompanyId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

ALTER TABLE [dbo].[Company] ADD  CONSTRAINT [PK_Company_Id] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

兮子 2024-12-03 19:40:40

他们似乎为您的查询使用一个 CPU 核心，而在您的计算机上查询可能会并行化（查询使用的所有操作都会并行化）。

但是，出于某种原因，索引扫描用于 LIKE 谓词，而索引查找就足够了。

请尝试使用此显式条件而不是 LIKE：

c.placeid >= 'ca'
AND c.placeid < 'cb'

并查看它是否将计划更改为 IX_CompanyPlaceId 上的 Index Seek。

They seem to use one CPU core for your query while on your machine the query probably parallelizes (all operations used by the query do parallelize).

However, an index scan is used for the LIKE predicate for some reason while an index seek could suffice.

Please try using this explicit condition instead of LIKE:

c.placeid >= 'ca'
AND c.placeid < 'cb'

and see if it changes the plan to an Index Seek on IX_CompanyPlaceId.

回复收藏 0 原文

自此以后，行同陌路 2024-12-03 19:40:40

只需几件事：

Azure 上的统计信息是最新的吗？我对 120 万行表的哈希匹配有点警惕
Azure 有自动统计吗？如果没有，您的本地数据库可能有更多信息，SQL Azure 无法使用这些信息来选择最佳查询计划
索引 c.placeid 以获取有关它的一些统计信息
为什么是 c.placeid< /代码> 一个字符串？这是否会延续到 companyid 和 c.id ？我认为这就是为什么你有哈希匹配 - 尝试加入整数代理键。