递归SQL查询加速非索引查询
这个问题很大程度上是出于好奇心,因为我确实有一个有效的查询(它只是比我想要的时间长一点)。
我有一个包含 400 万行的表。该表上的唯一索引是自动递增的 BigInt ID。该查询正在查找其中一列中的不同值,但仅返回 1 天。不幸的是,计算的 ReportDate 列不是 DateTime 类型,甚至不是 BigInt,而是 YYYYMMDD 格式的 char(8)。所以查询速度有点慢。
SELECT Category
FROM Reports
where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112)
GROUP BY Category
注意,上面语句中的日期转换只是将其转换为YYYYMMDD格式进行比较。
我想知道是否有一种方法可以优化此查询,因为我知道我唯一感兴趣的数据位于表的“底部”。我正在考虑某种递归 SELECT 函数,它逐渐增长一个可用于最终查询的临时表。
例如,在 psuedo-sql 中:
N = 128
TemporaryTable = SELECT TOP {N} *
FROM Reports
ORDER BY ID DESC
/* Once we hit a date < Today, we can stop */
if(TemporaryTable does not contain ReportDate < Today)
N = N**2
Repeat Select
/* We now have a smallish table to do our query */
SELECT Category
FROM TemproaryTable
where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112)
GROUP BY Category
这有意义吗?这样的事情可能吗?
这是在 MS SQL Server 2008 上。
This question is largely driven by curiosity, as I do have a working query (it just takes a little longer than I would like).
I have a table with 4 million rows. The only index on this table is an auto-increment BigInt ID. The query is looking for distinct values in one of the columns, but only going back 1 day. Unfortunately, the ReportDate column that is evaluated is not of the DateTime type, or even a BigInt, but is char(8) in the format of YYYYMMDD. So the query is a bit slow.
SELECT Category
FROM Reports
where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112)
GROUP BY Category
Note that the date converstion in the above statement is simply converting it to a YYYYMMDD format for comparison.
I was wondering if there was a way to optimize this query based on the fact that I know that the only data I am interested in is at the "bottom" of the table. I was thinking of some sort of recursive SELECT function which gradually grew a temporary table that could be used for the final query.
For example, in psuedo-sql:
N = 128
TemporaryTable = SELECT TOP {N} *
FROM Reports
ORDER BY ID DESC
/* Once we hit a date < Today, we can stop */
if(TemporaryTable does not contain ReportDate < Today)
N = N**2
Repeat Select
/* We now have a smallish table to do our query */
SELECT Category
FROM TemproaryTable
where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112)
GROUP BY Category
Does that make sense? Is something like that possible?
This is on MS SQL Server 2008.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我可能建议您不需要转换以 YYYYMMDD 格式存储为字符数据的
Date
;该格式本质上是可以单独排序的。我会将您的日期转换为该格式的输出。另外,您编写转换的方式是转换每个单独行的当前日期时间,因此即使为整个查询存储该值可以加快速度......但我认为只需转换您正在搜索的 char 格式的日期会有所帮助。
当然,我还建议您创建需要创建的索引...但这不是您问的问题:P
I might suggest you do not need to convert the
Date
that is stored as char data in YYYYMMDD format; That format is inherently sortable all by itself. I would instead convert your date to output in that format.Also, the way you have the conversion written, it is converting the current DateTime for every individual row, so even storing that value for the whole query could speed things up... but I think just converting the date you are searching for to that format of char would help.
I would also suggest getting the index(es) you need created, of course... but that's not the question you asked :P
为什么不直接创建您需要的索引呢?
Why not just create the index you need?
不,这没有道理。优化此查询的唯一方法是为其建立覆盖索引:
更新
考虑到您无法修改架构的评论,那么您应该修改架构。如果你仍然不能,那么答案仍然适用:解决方案是有一个索引。
最后,为了更直接地回答你的问题,如果你的 ID 和 ReportData 之间有很强的相关性:你寻找的 ID 是 ReportDate 小于你要查找的日期的最大 ID:
这将对ID 索引并停在所需日期之前的第一个 ID(即不会扫描整个表)。然后,您可以根据找到的最大 ID 过滤您的报告。
No, that doesn't make sense. The only way to optimize this query is to have a covering index for it:
Update
Considering your comment that you cannot modify the schema, then you should modify the schema. If you still can't, then the answer still applies: the solution is to have an index.
And finally, to answer more directly your question, if you have a strong correlation between ID and ReportData: the ID you seek is the biggest one that has a ReportDate smaller than the date you're after:
This will do a reverse scan on the ID index and stop at the first ID that is previous to your desired date (ie. will not scan the entire table). You can then filter your reports base don this found max Id.
我认为您会发现 Rob Farley 博客上有关 SARGability 的讨论与您的帖子主题相关,是非常有趣的阅读内容。
http://blogs.lobsterpot.com .au/2010/01/22/sargable-functions-in-sql-server/
一种不需要修改现有列数据类型的有趣替代方法是利用计算列。
I think you will find the discussion on SARGability, on Rob Farley's Blog to be very interesting reading in relation to your post topic.
http://blogs.lobsterpot.com.au/2010/01/22/sargable-functions-in-sql-server/
An interesting alternative approach that does not require you to modify the existing column data type would be to leverage computed columns.
我偶尔用来进入与您的索引类似的日志表的查询模式之一是通过子查询进行限制:
20k/4M = 0.5% 的表被读取。
这是一个循环解决方案。注意:可能希望将 ID 主键和 Reportdate 在临时表中建立索引。
One of the query patterns I occasionally use to get into a log table with similiar indexing to yours is to limit by subquery:
20k/4M = 0.5% of the table is read.
Here's a loop solution. Note: might want to make ID primary key and Reportdate indexed in the temp table.