请帮我解决这个查询(sql server 2008)

发布于 2024-08-18 09:42:26 字数 3239 浏览 5 评论 0原文

ALTER PROCEDURE ReadNews

 @CategoryID INT,
 @Culture TINYINT = NULL,
 @StartDate DATETIME = NULL,
 @EndDate DATETIME = NULL,
 @Start BIGINT, -- for paging
 @Count BIGINT -- for paging

AS
BEGIN
  SET NOCOUNT ON;  

  --ItemType for news is 0
  ;WITH Paging AS
  (
   SELECT news.ID,
     news.Title,
     news.Description,
     news.Date,
     news.Url,
     news.Vote,
     news.ResourceTitle,
     news.UserID,

     ROW_NUMBER() OVER(ORDER BY news.rank DESC) AS RowNumber, TotalCount = COUNT(*) OVER()

   FROM dbo.News news
   JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
   WHERE itemCat.ItemType = 0 -- news item 
     AND itemCat.CategoryID = @CategoryID
     AND (
       (@StartDate IS NULL OR news.Date >= @StartDate) AND 
       (@EndDate IS NULL OR news.Date <= @EndDate)
      )
     AND news.Culture = @Culture
     and news.[status] = 1

  )  
  SELECT * FROM Paging WHERE RowNumber >= @Start AND RowNumber <= (@Start + @Count - 1)
  OPTION (OPTIMIZE FOR (@CategoryID  UNKNOWN, @Culture UNKNOWN))
END  

以下是 NewsItemCategory 表的结构:

CREATE TABLE [dbo].[News](
 [ID] [bigint] NOT NULL,
 [Url] [varchar](300) NULL,
 [Title] [nvarchar](300) NULL,
 [Description] [nvarchar](3000) NULL,
 [Date] [datetime] NULL,
 [Rank] [smallint] NULL,
 [Vote] [smallint] NULL,
 [Culture] [tinyint] NULL,
 [ResourceTitle] [nvarchar](200) NULL,
 [Status] [tinyint] NULL

 CONSTRAINT [PK_News] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [ItemCategory](
 [ID] [bigint] IDENTITY(1,1) NOT NULL,
 [ItemID] [bigint] NOT NULL,
 [ItemType] [tinyint] NOT NULL,
 [CategoryID] [int] NOT NULL,
 CONSTRAINT [PK_ItemCategory] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

此查询读取特定类别(体育、政治等)的新闻。 @Culture参数指定新闻的语言,如0(英语)、1(法语)等。 ItemCategory 表将新闻记录与一个或多个类别相关联。 ItemCategory 表中的 ItemType 列指定 itemID 的类型。目前,我们只有 ItemType 0 表示 ItemID 引用 News 表中的记录。

目前,我在 ItemCategory 表上有以下索引:

CREATE NONCLUSTERED INDEX [IX_ItemCategory_ItemType_CategoryID__ItemID] ON [ItemCategory] 
(
 [ItemType] ASC,
 [CategoryID] ASC
)
INCLUDE ( [ItemID])

以及 News 表有以下索引(由查询分析器建议):

CREATE NONCLUSTERED INDEX [_dta_index_News_8_1734000549__K1_K7_K13_K15] ON [dbo].[News] 
(
 [ID] ASC,
 [Date] ASC,
 [Culture] ASC,
 [Status] ASC
)

使用这些索引,当我执行查询时,查询将在不到一秒的时间内执行对于某些参数,对于其他参数(例如不同的@Culture 或@CategoryID)可能最多需要2 分钟!我已使用 OPTIMIZE FOR (@CategoryID UNKNOWN, @Culture UNKNOWN) 来防止对 @CategoryID 和 @Culture 参数进行参数嗅探,但似乎不起作用对于一些参数。

目前,News 表中约有 2,870,000 条记录,ItemCategory 表中约有 4,740,000 条记录。

现在,我非常感谢有关如何优化此查询或其索引的任何建议。

更新: 执行计划:
在此处输入图像描述
(在此图像中,ItemNetwork 就是我所说的 ItemCategory它们是相同的)

ALTER PROCEDURE ReadNews

 @CategoryID INT,
 @Culture TINYINT = NULL,
 @StartDate DATETIME = NULL,
 @EndDate DATETIME = NULL,
 @Start BIGINT, -- for paging
 @Count BIGINT -- for paging

AS
BEGIN
  SET NOCOUNT ON;  

  --ItemType for news is 0
  ;WITH Paging AS
  (
   SELECT news.ID,
     news.Title,
     news.Description,
     news.Date,
     news.Url,
     news.Vote,
     news.ResourceTitle,
     news.UserID,

     ROW_NUMBER() OVER(ORDER BY news.rank DESC) AS RowNumber, TotalCount = COUNT(*) OVER()

   FROM dbo.News news
   JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
   WHERE itemCat.ItemType = 0 -- news item 
     AND itemCat.CategoryID = @CategoryID
     AND (
       (@StartDate IS NULL OR news.Date >= @StartDate) AND 
       (@EndDate IS NULL OR news.Date <= @EndDate)
      )
     AND news.Culture = @Culture
     and news.[status] = 1

  )  
  SELECT * FROM Paging WHERE RowNumber >= @Start AND RowNumber <= (@Start + @Count - 1)
  OPTION (OPTIMIZE FOR (@CategoryID  UNKNOWN, @Culture UNKNOWN))
END  

Here is the structure of News and ItemCategory tables:

CREATE TABLE [dbo].[News](
 [ID] [bigint] NOT NULL,
 [Url] [varchar](300) NULL,
 [Title] [nvarchar](300) NULL,
 [Description] [nvarchar](3000) NULL,
 [Date] [datetime] NULL,
 [Rank] [smallint] NULL,
 [Vote] [smallint] NULL,
 [Culture] [tinyint] NULL,
 [ResourceTitle] [nvarchar](200) NULL,
 [Status] [tinyint] NULL

 CONSTRAINT [PK_News] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [ItemCategory](
 [ID] [bigint] IDENTITY(1,1) NOT NULL,
 [ItemID] [bigint] NOT NULL,
 [ItemType] [tinyint] NOT NULL,
 [CategoryID] [int] NOT NULL,
 CONSTRAINT [PK_ItemCategory] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

This query reads news of a specific category (sport, politics, ...).
@Culture parameter specifies the language of news, like 0 (english), 1 (french), etc.
ItemCategory table relates a news record to one or more categories.
ItemType column in ItemCategory table specifies which type of itemID is there. for now, we have only ItemType 0 indicating that ItemID refers to a record in News table.

Currently, I have the following index on ItemCategory table:

CREATE NONCLUSTERED INDEX [IX_ItemCategory_ItemType_CategoryID__ItemID] ON [ItemCategory] 
(
 [ItemType] ASC,
 [CategoryID] ASC
)
INCLUDE ( [ItemID])

and the following index for News table (suggested by query analyzer):

CREATE NONCLUSTERED INDEX [_dta_index_News_8_1734000549__K1_K7_K13_K15] ON [dbo].[News] 
(
 [ID] ASC,
 [Date] ASC,
 [Culture] ASC,
 [Status] ASC
)

With these indexes, when I execute the query, the query executes in less than a second for some parameters, and for another parameters (e.g. different @Culture or @CategoryID) may take up to 2 minutes! I have used OPTIMIZE FOR (@CategoryID UNKNOWN, @Culture UNKNOWN) to prevent parameter sniffing for @CategoryID and @Culture parameters but seems not working for some parameters.

There are currently around 2,870,000 records in News table and 4,740,000 in ItemCategory table.

Now I greatly appreciate any advice on how to optimize this query or its indexes.

update:
execution plan:
enter image description here
(in this image, ItemNetwork is what I referred to as ItemCategory. they are the same)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

桜花祭 2024-08-25 09:42:26

您是否看过一些内置的 SQL 工具来帮助您解决此问题:

即从管理工作室菜单:

  • “查询”->“显示估计执行计划”
  • “查询”->“包括实际执行计划
  • ”工具'->'数据库引擎优化顾问'

Have you had a look at some of the inbuilt SQL tools to help you with this:

I.e. from the management studio menu:

  • 'Query'->'Display Estimated Execution Plan'
  • 'Query'->'Include Actual Execution Plan'
  • 'Tools'->'Database Engine Tuning Advisor'
赠我空喜 2024-08-25 09:42:26

OPTION OPTIMIZE 子句不应该是内部 SQL 的一部分,而不是 CTE 上 SELECT 的一部分吗?

Shouldn't the OPTION OPTIMIZE clause be part of the inner SQL, rather than of the SELECT on the CTE?

梦回旧景 2024-08-25 09:42:26

您应该查看对新闻表中的文化字段以及项目类别表中的itemid 和categoryid 字段进行索引。您可能不需要所有这些索引 - 我会一次尝试一个,然后组合起来,直到找到有效的方法。您现有的索引似乎对您的查询没有太大帮助。

You should look at indexing the culture field in the news table, and the itemid and categoryid fields in the item category table. You may not need all these indexes - I would try them one at a time, then in combination until you find something that works. Your existing indexes do not seem to help your query very much.

五里雾 2024-08-25 09:42:26

确实需要查看查询计划 - 值得注意的一件事是,您将 News 的聚集索引放在 News.ID 上,但它不是标识字段,而是 ItemCategory 表的 FK,这将导致新闻表上出现一些碎片随着时间的推移,所以它不太理想。

我怀疑根本问题是您的分页导致表扫描。

更新:

这些排序花费了您计划中 68% 的查询执行时间,这是有道理的,其中一种排序至少必须支持您正在使用的基于 news.rank desc 的排名函数,但是您没有可以原生支持该排名的索引。

获得一个索引来支持这一点会很有趣,您可以首先在 news.rank 上尝试一个简单的 NC 索引,SQL 可能会选择连接索引并避免排序,但这需要一些实验。

Really need to see the query plan - one thing of note is you put the clustered index for News on News.ID, but it is not an identity field but the FK for the ItemCategory table, this will result in some fragmentation on the news table over time, so it less than ideal.

I suspect the underlying problem is your paging is causing the table to scan.

Updated:

Those Sort's are costing you 68% of the query execution time from the plan, and that makes sense, one of those sorts at least must be to support the ranking function you are using that is based on news.rank desc, but you have no index that can support that ranking natively.

Getting an index in to support that will be interesting, you can try a simple NC index on news.rank first off, SQL may chose to join indexes and avoid the sort, but it will take some experimentation.

云柯 2024-08-25 09:42:26

尝试对 ItemCategory 表使用 itemId、categoryId 上的非聚集索引,并在 News 表上使用 Rank、Culture 上的非聚集索引。

Try using for ItemCategory table nonclustered index on itemId,categoryId and on News table also nonclustered index on Rank,Culture.

人心善变 2024-08-25 09:42:26

我终于想出了以下索引,它们运行良好,并且存储过程在不到一秒的时间内执行。我刚刚从查询中删除了 TotalCount = COUNT(*) OVER() ,但找不到任何好的索引。也许我编写一个单独的存储过程来计算记录总数。我什至可能决定使用 Twitter 和 Facebook 中的“更多”按钮,而不使用分页按钮。

对于新闻表:

CREATE NONCLUSTERED INDEX [IX_News_Rank_Culture_Status_Date] ON [dbo].[News] 
(
    [Rank] DESC,
    [Culture] ASC,
    [Status] ASC,
    [Date] ASC
)

对于 ItemNetwork 表:

CREATE NONCLUSTERED INDEX [IX_ItemNetwork_ItemID_NetworkID] ON ItemNetwork
(
    [ItemID] ASC,
    [NetworkID] ASC
)

我只是不知道 ItemNetwork 是否需要 ID 列上的聚集索引。我从不使用 ID 列从此表中检索记录。您认为在 (ItemID, NetworkID) 列上建立聚集索引更好吗?

I have finally come up with the following indexes which are working great and the stored procedure executes in less than a second. I have just removed TotalCount = COUNT(*) OVER() from the query and I couldn't find any good index for that. Maybe I write a separate stored procedure to calculate the total number of records. I may even decide to use a "more" button like in Twitter and Facebook without pagination buttons.

for news table:

CREATE NONCLUSTERED INDEX [IX_News_Rank_Culture_Status_Date] ON [dbo].[News] 
(
    [Rank] DESC,
    [Culture] ASC,
    [Status] ASC,
    [Date] ASC
)

for ItemNetwork table:

CREATE NONCLUSTERED INDEX [IX_ItemNetwork_ItemID_NetworkID] ON ItemNetwork
(
    [ItemID] ASC,
    [NetworkID] ASC
)

I just don't know whether ItemNetwork needs a clustered index on ID column. I am never retrieving a record from this table using the ID column. Do you think it's better to have a clustered index on (ItemID, NetworkID) columns?

自我难过 2024-08-25 09:42:26

请尝试更改

FROM dbo.News news
JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID

FROM dbo.News news
HASH JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID

,或者

FROM dbo.News news
LOOP JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID

我真的不知道您的数据中有什么,但该表的连接可能是瓶颈。

Please try to change

FROM dbo.News news
JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID

to

FROM dbo.News news
HASH JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID

or

FROM dbo.News news
LOOP JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID

I don't really know what is in your data, but the joining of this tables may be a bottleneck.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文