针对日期范围的条件的查询性能
这是一个表:
CREATE TABLE Meetings
(
ID int PRIMRY KEY IDENTITY(1,1)
StartDate DateTime NOT NULL,
EndDate DateTime NULL,
Field1 varchar(50),
Field2 varchar(50),
Field3 varchar(50),
Field4 varchar(50)
)
有几千行。数据范围的大小可以不同(从几天到 50 年)。
这是一个查询:
DECLARE @ApplicableDate DateTime
SELECT ID, StartDate, EndDate, Field1, Field2, Field3, Field4
FROM Meetings
WHERE StartDate <= @ApplicableDate AND
(EndDate is null || @ApplicableDate <= EndDate)
由于日期范围可能很大,因此可能会返回表的大部分内容(20%-50% 的行)。
该查询以简单的方式表示我想要的行,但性能相当糟糕。无论我添加什么索引,它都会执行聚集索引扫描。我尝试过:
- StartDate
- StartDate、EndDate
如何提高此查询的性能?
我已经查看了 这个问题和这个也是如此。这些解决方案对我的情况没有帮助 - 我真的不想通过创建一个单独的日期表来将查询转换为相等查询(修改结束日期或为空时会发生什么?)来破坏业务数据。 ),或者通过改变数据以适应空间索引。
尽管如此,我仍然愿意对数据结构进行可能的修改(特别是如果它们不添加行,并且不使用奇怪的数据类型)。
Here's a table:
CREATE TABLE Meetings
(
ID int PRIMRY KEY IDENTITY(1,1)
StartDate DateTime NOT NULL,
EndDate DateTime NULL,
Field1 varchar(50),
Field2 varchar(50),
Field3 varchar(50),
Field4 varchar(50)
)
There's several thousand rows. The data ranges can be varying sizes (from a couple days up to 50 years).
Here's a query:
DECLARE @ApplicableDate DateTime
SELECT ID, StartDate, EndDate, Field1, Field2, Field3, Field4
FROM Meetings
WHERE StartDate <= @ApplicableDate AND
(EndDate is null || @ApplicableDate <= EndDate)
Since the date ranges can be large, a large portion of the table might be returned (20%-50% of the rows).
The query represents the rows I want in a simple way, but the performance is pretty bad. It does a clustered index scan, no matter what indexes I add. I've tried:
- StartDate
- StartDate, EndDate
How can I improve the performance of this query?
I've reviewed the answers for this question and this one too. Those solutions aren't helpful in my situation - I don't really want to muck with the business' data by creating a separate table of Dates to turn the query into an equality query (what happens when end date is modified, or null?), or by morphing the data to fit in a spatial index.
Still, I'm open to possible modifications to the data structure (particular if they do not add rows, and do not use strange data types)..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我假设您使用的是 SQL Server 来了解我的语法。
将 ID 上的主键设置为非聚集索引。
在 StartDate 列上创建聚集索引。
按原样尝试您的查询。尽管数据的存储方式可能与集群 PK 的存储方式类似,但现在查询引擎将提前知道数据是按开始日期集群的。
I'm assuming you're on SQL Server for my syntax.
Make your primary key on ID a non-clustered index.
Create a clustered in index on the StartDate column.
Try your query as is. Even though the data is probably stored similarly to what you had with the clustered PK, now the query engine will know in advance that the data is clustered by the start date.
如果查询返回 20%-50% 的记录,那么扫描多次是最佳选择。如果你有一个索引,你总是必须找到索引中的数据,然后索引中包含表中的记录地址,然后你必须从磁盘中获取包含该记录的页,这有索引中相邻记录被扩散的风险整个磁盘。
如果您确实需要那么多记录并且性能很差,那么也许检查以下内容:
If the query returns 20%-50% of the records, then a scan is many times the best option. If you have an index, you always have to find the data in the index,which then contains a record address in the table, and you then have to get the page containing this record from disk, risking that adjacent records in the index are spread all over the disk.
If you really need that many records and performance is bad, then maybe check the following: