帮助提高 SQL Server 2008 CTE 的性能

发布于 2024-10-18 06:03:25 字数 1719 浏览 2 评论 0原文

我有一个 SQL Server 2008 CTE,它负责返回某个位置的热门评论

CTE 包装在 UDF(表值)中,并连接到 LocationId 字段,因此我可以获得每个位置的最高评论。

基数

1位置有0-多个帖子位置
1 帖子位置有 1 条帖子
1 帖子有 1 评论

这是 UDF:

CREATE FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
(   

)
RETURNS TABLE 
AS
RETURN 
(
    WITH [RankedLocations] AS
    (
        SELECT      PL.LocationId, 
                    R.Rating, 
                    P.PostID, 
                    P.UniqueUri, 
                    P.Content, 
                    ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank

        From        dbo.PostLocations As PL
        INNER JOIN  dbo.Posts As P
        ON          P.PostId = PL.PostId
        INNER JOIN  dbo.Reviews As R
        ON          R.PostId = P.PostId

        WHERE       R.ReviewTypeId <> 5
        AND         P.Content IS NOT NULL
    )

    SELECT  LocationId, Rating, PostID, UniqueUri, Content
    FROM    RankedLocations
    WHERE   ScoreRank = 1
)

这是我如何使用它的示例:

select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join dbo.Review_HighestRated_Aggregated_ByLocation() r 
on l.LocationId = r.LocationId

上面的查询需要 15 秒 执行,这是不可接受的。如果没有加入 UDF,则需要 0 秒。

关于如何改进它有什么想法吗?

如果我查看执行计划,SORT 占据了执行成本的 98%。此操作的 IO/子树成本约为 300。

我希望执行计划能给我一个提示,告诉我可以创建一个索引来提高成本,但我什么也没得到。

有什么想法吗?

I have a SQL Server 2008 CTE which is responsible for returning the Top Review for a location.

The CTE is wrapped in a UDF (Table-Valued), and joined on the LocationId field, so i can get the top review for each location.

Cardinalities:

1 Location has 0-many PostLocations

1 PostLocation has 1 Post

1 Post has 1 Review

Here's the UDF:

CREATE FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
(   

)
RETURNS TABLE 
AS
RETURN 
(
    WITH [RankedLocations] AS
    (
        SELECT      PL.LocationId, 
                    R.Rating, 
                    P.PostID, 
                    P.UniqueUri, 
                    P.Content, 
                    ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank

        From        dbo.PostLocations As PL
        INNER JOIN  dbo.Posts As P
        ON          P.PostId = PL.PostId
        INNER JOIN  dbo.Reviews As R
        ON          R.PostId = P.PostId

        WHERE       R.ReviewTypeId <> 5
        AND         P.Content IS NOT NULL
    )

    SELECT  LocationId, Rating, PostID, UniqueUri, Content
    FROM    RankedLocations
    WHERE   ScoreRank = 1
)

Here's an example of how i'm using it:

select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join dbo.Review_HighestRated_Aggregated_ByLocation() r 
on l.LocationId = r.LocationId

The above query is taking 15 seconds to execute, which is unacceptable. Without the join to the UDF it's takes 0 seconds.

Any ideas on how i can improve it?

If i look at the execution plan, it's the SORT that is taking up 98% of the execution cost. The IO/subtree cost of this operation is ~300.

I was hoping the execution plan would give me a hint to an index i could create to improve the cost, but i get nothing.

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

夢归不見 2024-10-25 06:03:25

所以我发现了性能问题,这不是 CTE,而是我使用它的方式。

我有几个查找表,其中一个专门针对位置类型(街道 = 7、城市 = 5 等)。

因此,为了保持 SQL 流畅和一致(并避免硬编码的幻数),我创建了一个包装标量函数,它根据字符串返回等效值,例如:

DECLARE @Street_LocationType = [dbo].[ToLocationTypeId]('Street')

该函数非常简单,只是一系列 CASE 语句。

但是,我是这样使用我的 CTE 的:

SELECT      a.LocationId, b.Content, b.UniqueUri
FROM        [dbo].[Locations] a
INNER JOIN  dbo.Review_HighestRated_Aggregated_ByLocation()  b -- UDF with CTE
ON          a.LocationId = b.LocationId        
WHERE       a.LocationTypeId = @Street_LocationType

所以我什至没有在 CTE 本身上使用它,而是将它用作位置表上的过滤器。

如果我更改上述内容以硬编码该值(例如 7),则过程执行时间从 13 秒降至 2 秒。

我不明白,但它解决了问题。我注意到当过程执行得很差时,查询计划中的“SORT”操作的估计行数 = 32,000 - 这基本上是系统中的每个帖子。

在我的更改之后,估计行数为 1(理应如此)。

确实是奇怪的活动。

So i found the performance problem, and it wasn't the CTE, it was how i was using it.

I have several lookup tables, one in particular for Location Type (Street = 7, City = 5, etc).

So to keep my SQL fluent and consistent (and avoid hard-coded magic numbers), i created a wrapper scalar function which returns the evuivalent value based on the string, e.g:

DECLARE @Street_LocationType = [dbo].[ToLocationTypeId]('Street')

The function is extremely simple, just a series of CASE statements.

But, i was using my CTE like this:

SELECT      a.LocationId, b.Content, b.UniqueUri
FROM        [dbo].[Locations] a
INNER JOIN  dbo.Review_HighestRated_Aggregated_ByLocation()  b -- UDF with CTE
ON          a.LocationId = b.LocationId        
WHERE       a.LocationTypeId = @Street_LocationType

So i wasn't even using it on the CTE itself, i was using it as a filter on the Locations table.

If i change the above to hardcode the value (e.g 7), the procedure execution time drops from 13 seconds to 2 secs.

I don't get it, but it solved the problem. I was noticing when the procedure was performing badly, the "SORT" operation in the query plan had an estimated number of rows = 32,000 - which is basically every post in the system.

After my changes, the estimated number of rows is 1 (as it should be).

Bizarre activity indeed.

愛放△進行李 2024-10-25 06:03:25

如果您的表值函数不需要参数,请考虑使用 VIEW 而不是 UDF。可能它解决了性能问题。

If your table valued function dont need parameters consider to use a VIEW instead a UDF. Probably it solves the performance problem.

一杯敬自由 2024-10-25 06:03:25

要将 CTE 和 UDF 转换为 VIEW:

DROP FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
GO

CREATE VIEW Review_HighestRated_Aggregated_ByLocation
AS
SELECT  LocationId, Rating, PostID, UniqueUri, Content
FROM
(
    SELECT      PL.LocationId, 
                R.Rating, 
                P.PostID, 
                P.UniqueUri, 
                P.Content, 
                ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank
    From        dbo.PostLocations As PL
    INNER JOIN  dbo.Posts As P
    ON          P.PostId = PL.PostId
    INNER JOIN  dbo.Reviews As R
    ON          R.PostId = P.PostId
    WHERE       R.ReviewTypeId <> 5
    AND         P.Content IS NOT NULL
) RankedLocations
WHERE   ScoreRank = 1

GO

修改 OP 的示例查询以使用新的 VIEW:

select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join Review_HighestRated_Aggregated_ByLocation r 
on l.LocationId = r.LocationId

To convert the CTE and the UDF into a VIEW:

DROP FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
GO

CREATE VIEW Review_HighestRated_Aggregated_ByLocation
AS
SELECT  LocationId, Rating, PostID, UniqueUri, Content
FROM
(
    SELECT      PL.LocationId, 
                R.Rating, 
                P.PostID, 
                P.UniqueUri, 
                P.Content, 
                ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank
    From        dbo.PostLocations As PL
    INNER JOIN  dbo.Posts As P
    ON          P.PostId = PL.PostId
    INNER JOIN  dbo.Reviews As R
    ON          R.PostId = P.PostId
    WHERE       R.ReviewTypeId <> 5
    AND         P.Content IS NOT NULL
) RankedLocations
WHERE   ScoreRank = 1

GO

OP's sample query revised to use the new VIEW:

select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join Review_HighestRated_Aggregated_ByLocation r 
on l.LocationId = r.LocationId
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文