帮助提高 SQL Server 2008 CTE 的性能

发布于 2024-10-18 06:03:25 字数 1719 浏览 12 评论 0原文

我有一个 SQL Server 2008 CTE，它负责返回某个位置的热门评论。

CTE 包装在 UDF（表值）中，并连接到 LocationId 字段，因此我可以获得每个位置的最高评论。

基数：

1位置有0-多个帖子位置
1 帖子位置有 1 条帖子
1 帖子有 1 评论

这是 UDF：

CREATE FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
(   

)
RETURNS TABLE 
AS
RETURN 
(
    WITH [RankedLocations] AS
    (
        SELECT      PL.LocationId, 
                    R.Rating, 
                    P.PostID, 
                    P.UniqueUri, 
                    P.Content, 
                    ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank

        From        dbo.PostLocations As PL
        INNER JOIN  dbo.Posts As P
        ON          P.PostId = PL.PostId
        INNER JOIN  dbo.Reviews As R
        ON          R.PostId = P.PostId

        WHERE       R.ReviewTypeId <> 5
        AND         P.Content IS NOT NULL
    )

    SELECT  LocationId, Rating, PostID, UniqueUri, Content
    FROM    RankedLocations
    WHERE   ScoreRank = 1
)

这是我如何使用它的示例：

select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join dbo.Review_HighestRated_Aggregated_ByLocation() r 
on l.LocationId = r.LocationId

上面的查询需要 15 秒 执行，这是不可接受的。如果没有加入 UDF，则需要 0 秒。

关于如何改进它有什么想法吗？

如果我查看执行计划，SORT 占据了执行成本的 98%。此操作的 IO/子树成本约为 300。

我希望执行计划能给我一个提示，告诉我可以创建一个索引来提高成本，但我什么也没得到。

有什么想法吗？

原文

I have a SQL Server 2008 CTE which is responsible for returning the Top Review for a location.

The CTE is wrapped in a UDF (Table-Valued), and joined on the LocationId field, so i can get the top review for each location.

Cardinalities:

1 Location has 0-many PostLocations

1 PostLocation has 1 Post

1 Post has 1 Review

Here's the UDF:

CREATE FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
(   

)
RETURNS TABLE 
AS
RETURN 
(
    WITH [RankedLocations] AS
    (
        SELECT      PL.LocationId, 
                    R.Rating, 
                    P.PostID, 
                    P.UniqueUri, 
                    P.Content, 
                    ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank

        From        dbo.PostLocations As PL
        INNER JOIN  dbo.Posts As P
        ON          P.PostId = PL.PostId
        INNER JOIN  dbo.Reviews As R
        ON          R.PostId = P.PostId

        WHERE       R.ReviewTypeId <> 5
        AND         P.Content IS NOT NULL
    )

    SELECT  LocationId, Rating, PostID, UniqueUri, Content
    FROM    RankedLocations
    WHERE   ScoreRank = 1
)

Here's an example of how i'm using it:

select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join dbo.Review_HighestRated_Aggregated_ByLocation() r 
on l.LocationId = r.LocationId

The above query is taking 15 seconds to execute, which is unacceptable. Without the join to the UDF it's takes 0 seconds.

Any ideas on how i can improve it?

If i look at the execution plan, it's the SORT that is taking up 98% of the execution cost. The IO/subtree cost of this operation is ~300.

I was hoping the execution plan would give me a hint to an index i could create to improve the cost, but i get nothing.

Any ideas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夢归不見 2024-10-25 06:03:25

所以我发现了性能问题，这不是 CTE，而是我使用它的方式。

我有几个查找表，其中一个专门针对位置类型（街道 = 7、城市 = 5 等）。

因此，为了保持 SQL 流畅和一致（并避免硬编码的幻数），我创建了一个包装标量函数，它根据字符串返回等效值，例如：

DECLARE @Street_LocationType = [dbo].[ToLocationTypeId]('Street')

该函数非常简单，只是一系列 CASE 语句。

但是，我是这样使用我的 CTE 的：

SELECT      a.LocationId, b.Content, b.UniqueUri
FROM        [dbo].[Locations] a
INNER JOIN  dbo.Review_HighestRated_Aggregated_ByLocation()  b -- UDF with CTE
ON          a.LocationId = b.LocationId        
WHERE       a.LocationTypeId = @Street_LocationType

所以我什至没有在 CTE 本身上使用它，而是将它用作位置表上的过滤器。

如果我更改上述内容以硬编码该值（例如 7），则过程执行时间从 13 秒降至 2 秒。

我不明白，但它解决了问题。我注意到当过程执行得很差时，查询计划中的“SORT”操作的估计行数 = 32,000 - 这基本上是系统中的每个帖子。

在我的更改之后，估计行数为 1（理应如此）。

确实是奇怪的活动。

So i found the performance problem, and it wasn't the CTE, it was how i was using it.

I have several lookup tables, one in particular for Location Type (Street = 7, City = 5, etc).

So to keep my SQL fluent and consistent (and avoid hard-coded magic numbers), i created a wrapper scalar function which returns the evuivalent value based on the string, e.g:

DECLARE @Street_LocationType = [dbo].[ToLocationTypeId]('Street')

The function is extremely simple, just a series of CASE statements.

But, i was using my CTE like this:

SELECT      a.LocationId, b.Content, b.UniqueUri
FROM        [dbo].[Locations] a
INNER JOIN  dbo.Review_HighestRated_Aggregated_ByLocation()  b -- UDF with CTE
ON          a.LocationId = b.LocationId        
WHERE       a.LocationTypeId = @Street_LocationType

So i wasn't even using it on the CTE itself, i was using it as a filter on the Locations table.

If i change the above to hardcode the value (e.g 7), the procedure execution time drops from 13 seconds to 2 secs.

I don't get it, but it solved the problem. I was noticing when the procedure was performing badly, the "SORT" operation in the query plan had an estimated number of rows = 32,000 - which is basically every post in the system.

After my changes, the estimated number of rows is 1 (as it should be).

Bizarre activity indeed.

回复收藏 0 原文

愛放△進行李 2024-10-25 06:03:25

如果您的表值函数不需要参数，请考虑使用 VIEW 而不是 UDF。可能它解决了性能问题。

回复收藏 0 原文

一杯敬自由 2024-10-25 06:03:25

要将 CTE 和 UDF 转换为 VIEW：

DROP FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
GO

CREATE VIEW Review_HighestRated_Aggregated_ByLocation
AS
SELECT  LocationId, Rating, PostID, UniqueUri, Content
FROM
(
    SELECT      PL.LocationId, 
                R.Rating, 
                P.PostID, 
                P.UniqueUri, 
                P.Content, 
                ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank
    From        dbo.PostLocations As PL
    INNER JOIN  dbo.Posts As P
    ON          P.PostId = PL.PostId
    INNER JOIN  dbo.Reviews As R
    ON          R.PostId = P.PostId
    WHERE       R.ReviewTypeId <> 5
    AND         P.Content IS NOT NULL
) RankedLocations
WHERE   ScoreRank = 1

GO

修改 OP 的示例查询以使用新的 VIEW：

select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join Review_HighestRated_Aggregated_ByLocation r 
on l.LocationId = r.LocationId

To convert the CTE and the UDF into a VIEW:

DROP FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
GO

CREATE VIEW Review_HighestRated_Aggregated_ByLocation
AS
SELECT  LocationId, Rating, PostID, UniqueUri, Content
FROM
(
    SELECT      PL.LocationId, 
                R.Rating, 
                P.PostID, 
                P.UniqueUri, 
                P.Content, 
                ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank
    From        dbo.PostLocations As PL
    INNER JOIN  dbo.Posts As P
    ON          P.PostId = PL.PostId
    INNER JOIN  dbo.Reviews As R
    ON          R.PostId = P.PostId
    WHERE       R.ReviewTypeId <> 5
    AND         P.Content IS NOT NULL
) RankedLocations
WHERE   ScoreRank = 1

GO

OP's sample query revised to use the new VIEW:

select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join Review_HighestRated_Aggregated_ByLocation r 
on l.LocationId = r.LocationId

回复收藏 0 原文

~没有更多了~