帮助提高 SQL Server 2008 CTE 的性能
我有一个 SQL Server 2008 CTE,它负责返回某个位置的热门评论。
CTE 包装在 UDF(表值)中,并连接到 LocationId 字段,因此我可以获得每个位置的最高评论。
基数:
1位置有0-多个帖子位置
1 帖子位置有 1 条帖子
1 帖子有 1 评论
这是 UDF:
CREATE FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
(
)
RETURNS TABLE
AS
RETURN
(
WITH [RankedLocations] AS
(
SELECT PL.LocationId,
R.Rating,
P.PostID,
P.UniqueUri,
P.Content,
ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank
From dbo.PostLocations As PL
INNER JOIN dbo.Posts As P
ON P.PostId = PL.PostId
INNER JOIN dbo.Reviews As R
ON R.PostId = P.PostId
WHERE R.ReviewTypeId <> 5
AND P.Content IS NOT NULL
)
SELECT LocationId, Rating, PostID, UniqueUri, Content
FROM RankedLocations
WHERE ScoreRank = 1
)
这是我如何使用它的示例:
select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join dbo.Review_HighestRated_Aggregated_ByLocation() r
on l.LocationId = r.LocationId
上面的查询需要 15 秒 执行,这是不可接受的。如果没有加入 UDF,则需要 0 秒。
关于如何改进它有什么想法吗?
如果我查看执行计划,SORT 占据了执行成本的 98%。此操作的 IO/子树成本约为 300。
我希望执行计划能给我一个提示,告诉我可以创建一个索引来提高成本,但我什么也没得到。
有什么想法吗?
I have a SQL Server 2008 CTE which is responsible for returning the Top Review for a location.
The CTE is wrapped in a UDF (Table-Valued), and joined on the LocationId field, so i can get the top review for each location.
Cardinalities:
1 Location has 0-many PostLocations
1 PostLocation has 1 Post
1 Post has 1 Review
Here's the UDF:
CREATE FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
(
)
RETURNS TABLE
AS
RETURN
(
WITH [RankedLocations] AS
(
SELECT PL.LocationId,
R.Rating,
P.PostID,
P.UniqueUri,
P.Content,
ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank
From dbo.PostLocations As PL
INNER JOIN dbo.Posts As P
ON P.PostId = PL.PostId
INNER JOIN dbo.Reviews As R
ON R.PostId = P.PostId
WHERE R.ReviewTypeId <> 5
AND P.Content IS NOT NULL
)
SELECT LocationId, Rating, PostID, UniqueUri, Content
FROM RankedLocations
WHERE ScoreRank = 1
)
Here's an example of how i'm using it:
select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join dbo.Review_HighestRated_Aggregated_ByLocation() r
on l.LocationId = r.LocationId
The above query is taking 15 seconds to execute, which is unacceptable. Without the join to the UDF it's takes 0 seconds.
Any ideas on how i can improve it?
If i look at the execution plan, it's the SORT that is taking up 98% of the execution cost. The IO/subtree cost of this operation is ~300.
I was hoping the execution plan would give me a hint to an index i could create to improve the cost, but i get nothing.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
所以我发现了性能问题,这不是 CTE,而是我使用它的方式。
我有几个查找表,其中一个专门针对位置类型(街道 = 7、城市 = 5 等)。
因此,为了保持 SQL 流畅和一致(并避免硬编码的幻数),我创建了一个包装标量函数,它根据字符串返回等效值,例如:
该函数非常简单,只是一系列 CASE 语句。
但是,我是这样使用我的 CTE 的:
所以我什至没有在 CTE 本身上使用它,而是将它用作位置表上的过滤器。
如果我更改上述内容以硬编码该值(例如 7),则过程执行时间从 13 秒降至 2 秒。
我不明白,但它解决了问题。我注意到当过程执行得很差时,查询计划中的“SORT”操作的估计行数 = 32,000 - 这基本上是系统中的每个帖子。
在我的更改之后,估计行数为 1(理应如此)。
确实是奇怪的活动。
So i found the performance problem, and it wasn't the CTE, it was how i was using it.
I have several lookup tables, one in particular for Location Type (Street = 7, City = 5, etc).
So to keep my SQL fluent and consistent (and avoid hard-coded magic numbers), i created a wrapper scalar function which returns the evuivalent value based on the string, e.g:
The function is extremely simple, just a series of CASE statements.
But, i was using my CTE like this:
So i wasn't even using it on the CTE itself, i was using it as a filter on the Locations table.
If i change the above to hardcode the value (e.g 7), the procedure execution time drops from 13 seconds to 2 secs.
I don't get it, but it solved the problem. I was noticing when the procedure was performing badly, the "SORT" operation in the query plan had an estimated number of rows = 32,000 - which is basically every post in the system.
After my changes, the estimated number of rows is 1 (as it should be).
Bizarre activity indeed.
如果您的表值函数不需要参数,请考虑使用
VIEW
而不是 UDF。可能它解决了性能问题。If your table valued function dont need parameters consider to use a
VIEW
instead a UDF. Probably it solves the performance problem.要将 CTE 和 UDF 转换为 VIEW:
修改 OP 的示例查询以使用新的 VIEW:
To convert the CTE and the UDF into a VIEW:
OP's sample query revised to use the new VIEW: