SQL Azure 查询性能 - 即使经过调整的查询也非常慢

发布于 2024-11-26 19:40:40 字数 3584 浏览 0 评论 0 原文

下面是一个依赖两个非聚集索引的基本查询:

SELECT cc.categoryid, count(*) from company c
INNER JOIN companycategory cc on cc.companyid = c.id
WHERE c.placeid like 'ca_%'
GROUP BY cc.categoryid order by count(*) desc

当完全相同的数据库托管在 SQL Server 2008 上时,几乎在任何硬件上,都会返回 << 500 毫秒即使清除了缓存缓冲区:

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

...这仍然在传统 SQL 上约 1 秒内返回。

在 Azure 上,每次返回大约需要 3.5 秒。

一些文章似乎表明人们普遍对 SQL Azure 中的查询性能感到满意。然而,这是一个基本场景,其中“明显”的调整已经用尽,并且没有网络延迟问题可言。只是在处理大型表时真的很慢(companycategroy 有 120 万条记录,places 有 7500 条记录)。

数据库总大小不超过4GB。选择“Web”版本与“企业”版本似乎也没有太大区别。

我缺少什么?

这只是一个基本示例,随着更复杂的查询,它只会变得更糟,所有这些都是 经过审查、调整并在本地表现良好。

这是执行计划:

  |--Sort(ORDER BY:([Expr1004] DESC))
       |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))
            |--Hash Match(Aggregate, HASH:([cc].[CategoryId]), RESIDUAL:([XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId] = [XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId]) DEFINE:([Expr1007]=COUNT(*)))
                 |--Hash Match(Inner Join, HASH:([c].[Id])=([cc].[CompanyId]))
                      |--Index Scan(OBJECT:([XX].[dbo].[Company].[IX_Company_PlaceId] AS [c]),  WHERE:([XX].[dbo].[Company].[PlaceId] as [c].[PlaceId] like N'ca_%'))
                      |--Index Scan(OBJECT:([XX].[dbo].[CompanyCategory].[IX_CompanyCategory_CompanyId] AS [cc]))

这是统计数据:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 14 ms, elapsed time = 14 ms.

(789 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CompanyCategory'. Scan count 1, logical reads 5183, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Company'. Scan count 1, logical reads 8710, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 3328 ms,  elapsed time = 3299 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

索引定义如下:

CREATE NONCLUSTERED INDEX [IX_Company_PlaceId] ON [dbo].[Company] 
(
    [PlaceId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

CREATE NONCLUSTERED INDEX [IX_CompanyCategory_CompanyId] ON [dbo].[CompanyCategory] 
(
    [CompanyId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

ALTER TABLE [dbo].[Company] ADD  CONSTRAINT [PK_Company_Id] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

Here's a basic query that relies on two non-clustered indexes:

SELECT cc.categoryid, count(*) from company c
INNER JOIN companycategory cc on cc.companyid = c.id
WHERE c.placeid like 'ca_%'
GROUP BY cc.categoryid order by count(*) desc

When the exact same database is hosted on SQL Server 2008, on virtually any hardware, this returns < 500 ms. Even with the cache buffers cleared:

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

... this still returns in ~1 second on traditional SQL.

On Azure, it takes approximately 3.5 seconds to return each time.

Some articles out there seem to suggest that people are generally happy with query performance in SQL Azure. And yet here's a basic scenario where 'obvious' tuning has been exhausted and there's no network latency issues to speak of. It's just really slow when working w/ large tables (companycategroy has 1.2M records, places has 7.5K).

The total database size is no more than 4GB. Selecting 'Web' edition vs. 'Enterprise' edition doesn't seem to make much of a difference either.

What am I missing?

This is only a basic example, it only gets worse with more sophisticated queries, all of have been reviewed, tuned, and perform well on-premise.

Here's the execution plan:

  |--Sort(ORDER BY:([Expr1004] DESC))
       |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))
            |--Hash Match(Aggregate, HASH:([cc].[CategoryId]), RESIDUAL:([XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId] = [XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId]) DEFINE:([Expr1007]=COUNT(*)))
                 |--Hash Match(Inner Join, HASH:([c].[Id])=([cc].[CompanyId]))
                      |--Index Scan(OBJECT:([XX].[dbo].[Company].[IX_Company_PlaceId] AS [c]),  WHERE:([XX].[dbo].[Company].[PlaceId] as [c].[PlaceId] like N'ca_%'))
                      |--Index Scan(OBJECT:([XX].[dbo].[CompanyCategory].[IX_CompanyCategory_CompanyId] AS [cc]))

And here are the stats:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 14 ms, elapsed time = 14 ms.

(789 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CompanyCategory'. Scan count 1, logical reads 5183, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Company'. Scan count 1, logical reads 8710, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 3328 ms,  elapsed time = 3299 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Index definitions are as follows:

CREATE NONCLUSTERED INDEX [IX_Company_PlaceId] ON [dbo].[Company] 
(
    [PlaceId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

CREATE NONCLUSTERED INDEX [IX_CompanyCategory_CompanyId] ON [dbo].[CompanyCategory] 
(
    [CompanyId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

ALTER TABLE [dbo].[Company] ADD  CONSTRAINT [PK_Company_Id] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

兮子 2024-12-03 19:40:40

他们似乎为您的查询使用一个 CPU 核心,而在您的计算机上查询可能会并行化(查询使用的所有操作都会并行化)。

但是,出于某种原因,索引扫描用于 LIKE 谓词,而索引查找就足够了。

请尝试使用此显式条件而不是 LIKE

c.placeid >= 'ca'
AND c.placeid < 'cb'

并查看它是否将计划更改为 IX_CompanyPlaceId 上的 Index Seek

They seem to use one CPU core for your query while on your machine the query probably parallelizes (all operations used by the query do parallelize).

However, an index scan is used for the LIKE predicate for some reason while an index seek could suffice.

Please try using this explicit condition instead of LIKE:

c.placeid >= 'ca'
AND c.placeid < 'cb'

and see if it changes the plan to an Index Seek on IX_CompanyPlaceId.

自此以后,行同陌路 2024-12-03 19:40:40

只需几件事:

  • Azure 上的统计信息是最新的吗?我对 120 万行表的哈希匹配有点警惕
  • Azure 有自动统计吗?如果没有,您的本地数据库可能有更多信息,SQL Azure 无法使用这些信息来选择最佳查询计划
  • 索引 c.placeid 以获取有关它的一些统计信息
  • 为什么是 c.placeid< /代码> 一个字符串?这是否会延续到 companyidc.id ?我认为这就是为什么你有哈希匹配 - 尝试加入整数代理键。

Just a few things:

  • Are stats up to date on Azure? I'm a bit wary of that Hash Match for a 1.2M row table
  • Does Azure have auto stats? If not, your local database might have a lot more information that SQL Azure can't use to pick an optimal query plan
  • Index c.placeid for some statistics on it
  • Why is c.placeid a string? Does this follow through to companyid and c.id? I think this is why you have the Hash Match - try joining on integer surrogate keys instead.
绿阴红影里的.如风往事 2024-12-03 19:40:40

我在 Azure SQL 数据库索引维护上发布此链接,因为索引维护仍然需要帮助。

https://blogs.msdn.microsoft.com/azuresqldbsupport/2016/07/03/how-to-maintain-azure-sql-indexes-and-statistics/

我们使用 Runbook 来执行我们在不同弹性池上的 350 多个数据库来执行索引维护。希望其他人发现这些信息和我们一样有帮助。

I am posting this link on Azure SQL Database index maintenance since index maintenance still needs a helping hand.

https://blogs.msdn.microsoft.com/azuresqldbsupport/2016/07/03/how-to-maintain-azure-sql-indexes-and-statistics/

We use a runbook to execute across our 350+ databases on different elastic pools to perform the index maintenance. Hope others find the information as helpful as we did.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文