递归SQL查询加速非索引查询

发布于 2024-09-27 11:29:04 字数 1022 浏览 2 评论 0原文

这个问题很大程度上是出于好奇心，因为我确实有一个有效的查询（它只是比我想要的时间长一点）。

我有一个包含 400 万行的表。该表上的唯一索引是自动递增的 BigInt ID。该查询正在查找其中一列中的不同值，但仅返回 1 天。不幸的是，计算的 ReportDate 列不是 DateTime 类型，甚至不是 BigInt，而是 YYYYMMDD 格式的 char(8)。所以查询速度有点慢。

  SELECT Category 
    FROM Reports 
   where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112) 
GROUP BY Category

注意，上面语句中的日期转换只是将其转换为YYYYMMDD格式进行比较。

我想知道是否有一种方法可以优化此查询，因为我知道我唯一感兴趣的数据位于表的“底部”。我正在考虑某种递归 SELECT 函数，它逐渐增长一个可用于最终查询的临时表。

例如，在 psuedo-sql 中：

N = 128
TemporaryTable = SELECT TOP {N} * 
                   FROM Reports 
               ORDER BY ID DESC 

/* Once we hit a date < Today, we can stop */
if(TemporaryTable does not contain ReportDate < Today) 
  N = N**2
  Repeat Select

/* We now have a smallish table to do our query */
  SELECT Category 
    FROM TemproaryTable 
   where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112) 
GROUP BY Category

这有意义吗？这样的事情可能吗？

这是在 MS SQL Server 2008 上。

原文

This question is largely driven by curiosity, as I do have a working query (it just takes a little longer than I would like).

I have a table with 4 million rows. The only index on this table is an auto-increment BigInt ID. The query is looking for distinct values in one of the columns, but only going back 1 day. Unfortunately, the ReportDate column that is evaluated is not of the DateTime type, or even a BigInt, but is char(8) in the format of YYYYMMDD. So the query is a bit slow.

  SELECT Category 
    FROM Reports 
   where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112) 
GROUP BY Category

Note that the date converstion in the above statement is simply converting it to a YYYYMMDD format for comparison.

I was wondering if there was a way to optimize this query based on the fact that I know that the only data I am interested in is at the "bottom" of the table. I was thinking of some sort of recursive SELECT function which gradually grew a temporary table that could be used for the final query.

For example, in psuedo-sql:

N = 128
TemporaryTable = SELECT TOP {N} * 
                   FROM Reports 
               ORDER BY ID DESC 

/* Once we hit a date < Today, we can stop */
if(TemporaryTable does not contain ReportDate < Today) 
  N = N**2
  Repeat Select

/* We now have a smallish table to do our query */
  SELECT Category 
    FROM TemproaryTable 
   where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112) 
GROUP BY Category

Does that make sense? Is something like that possible?

This is on MS SQL Server 2008.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

囚我心虐我身 2024-10-04 11:29:04

我可能建议您不需要转换以 YYYYMMDD 格式存储为字符数据的 Date ；该格式本质上是可以单独排序的。我会将您的日期转换为该格式的输出。

另外，您编写转换的方式是转换每个单独行的当前日期时间，因此即使为整个查询存储该值可以加快速度......但我认为只需转换您正在搜索的 char 格式的日期会有所帮助。

当然，我还建议您创建需要创建的索引...但这不是您问的问题：P

回复收藏 0 原文

抠脚大汉 2024-10-04 11:29:04

为什么不直接创建您需要的索引呢？

create index idx_Reports_ReportDate 
    on Reports(ReportDate, Category)

Why not just create the index you need?

create index idx_Reports_ReportDate 
    on Reports(ReportDate, Category)

回复收藏 0 原文

流年里的时光 2024-10-04 11:29:04

不，这没有道理。优化此查询的唯一方法是为其建立覆盖索引：

CREATE INDEX ndxReportDateCategory ON Reports (ReportDate, Category);

更新

考虑到您无法修改架构的评论，那么您应该修改架构。如果你仍然不能，那么答案仍然适用：解决方案是有一个索引。

最后，为了更直接地回答你的问题，如果你的 ID 和 ReportData 之间有很强的相关性：你寻找的 ID 是 ReportDate 小于你要查找的日期的最大 ID：

SELECT MAX(Id) 
FROM Reports
WHERE ReportDate < 'YYYYMMDD';

这将对ID 索引并停在所需日期之前的第一个 ID（即不会扫描整个表）。然后，您可以根据找到的最大 ID 过滤您的报告。

No, that doesn't make sense. The only way to optimize this query is to have a covering index for it:

CREATE INDEX ndxReportDateCategory ON Reports (ReportDate, Category);

Update

Considering your comment that you cannot modify the schema, then you should modify the schema. If you still can't, then the answer still applies: the solution is to have an index.

And finally, to answer more directly your question, if you have a strong correlation between ID and ReportData: the ID you seek is the biggest one that has a ReportDate smaller than the date you're after:

SELECT MAX(Id) 
FROM Reports
WHERE ReportDate < 'YYYYMMDD';

This will do a reverse scan on the ID index and stop at the first ID that is previous to your desired date (ie. will not scan the entire table). You can then filter your reports base don this found max Id.

回复收藏 0 原文

痕至 2024-10-04 11:29:04

我认为您会发现 Rob Farley 博客上有关 SARGability 的讨论与您的帖子主题相关，是非常有趣的阅读内容。

http://blogs.lobsterpot.com .au/2010/01/22/sargable-functions-in-sql-server/

一种不需要修改现有列数据类型的有趣替代方法是利用计算列。

alter table REPORTS
add castAsDate as CAST(ReportDate as date)

create index rf_so2 on REPORTS(castAsDate) include (ReportDate)

I think you will find the discussion on SARGability, on Rob Farley's Blog to be very interesting reading in relation to your post topic.

http://blogs.lobsterpot.com.au/2010/01/22/sargable-functions-in-sql-server/

An interesting alternative approach that does not require you to modify the existing column data type would be to leverage computed columns.

alter table REPORTS
add castAsDate as CAST(ReportDate as date)

create index rf_so2 on REPORTS(castAsDate) include (ReportDate)

回复收藏 0 原文

苍暮颜 2024-10-04 11:29:04

我偶尔用来进入与您的索引类似的日志表的查询模式之一是通过子查询进行限制：

DECLARE @ReportDate varchar(8)
SET @ReportDate = Convert(varchar(8), GetDate(), 112)

SELECT *
FROM
(
SELECT top 20000 *
FROM Reports
ORDER BY ID desc
) sub
WHERE sub.ReportDate = @ReportDate

20k/4M = 0.5% 的表被读取。

这是一个循环解决方案。注意：可能希望将 ID 主键和 Reportdate 在临时表中建立索引。

DECLARE @ReportDate varchar(8)
SET @ReportDate = Convert(varchar(8), GetDate(), 112)

DECLARE @CurrentDate varchar(8), MinKey bigint


SELECT top 2000 * INTO #MyTable
FROM Reports ORDER BY ID desc

SELECT @CurrentDate = MIN(ReportDate), @MinKey = MIN(ID)
FROM #MyTable

WHILE @ReportDate <= @CurrentDate
BEGIN

  SELECT top 2000 * INTO #MyTable
  FROM Reports WHERE ID < @MinKey ORDER BY ID desc

  SELECT @CurrentDate = MIN(ReportDate), @MinKey = MIN(ID)
  FROM #MyTable

END

SELECT * FROM #MyTable
WHERE ReportDate = @ReportDate


DROP TABLE #MyTable

One of the query patterns I occasionally use to get into a log table with similiar indexing to yours is to limit by subquery:

DECLARE @ReportDate varchar(8)
SET @ReportDate = Convert(varchar(8), GetDate(), 112)

SELECT *
FROM
(
SELECT top 20000 *
FROM Reports
ORDER BY ID desc
) sub
WHERE sub.ReportDate = @ReportDate

20k/4M = 0.5% of the table is read.

Here's a loop solution. Note: might want to make ID primary key and Reportdate indexed in the temp table.

DECLARE @ReportDate varchar(8)
SET @ReportDate = Convert(varchar(8), GetDate(), 112)

DECLARE @CurrentDate varchar(8), MinKey bigint


SELECT top 2000 * INTO #MyTable
FROM Reports ORDER BY ID desc

SELECT @CurrentDate = MIN(ReportDate), @MinKey = MIN(ID)
FROM #MyTable

WHILE @ReportDate <= @CurrentDate
BEGIN

  SELECT top 2000 * INTO #MyTable
  FROM Reports WHERE ID < @MinKey ORDER BY ID desc

  SELECT @CurrentDate = MIN(ReportDate), @MinKey = MIN(ID)
  FROM #MyTable

END

SELECT * FROM #MyTable
WHERE ReportDate = @ReportDate


DROP TABLE #MyTable

回复收藏 0 原文

~没有更多了~