针对日期范围的条件的查询性能

发布于 2024-09-12 21:40:45 字数 1168 浏览 9 评论 0原文

这是一个表:

CREATE TABLE Meetings
(
  ID int PRIMRY KEY IDENTITY(1,1)
  StartDate DateTime NOT NULL,
  EndDate DateTime NULL,
  Field1 varchar(50),
  Field2 varchar(50),
  Field3 varchar(50),
  Field4 varchar(50)
)

有几千行。数据范围的大小可以不同(从几天到 50 年)。

这是一个查询:

DECLARE @ApplicableDate DateTime

SELECT ID, StartDate, EndDate, Field1, Field2, Field3, Field4
FROM Meetings
WHERE StartDate <= @ApplicableDate AND
  (EndDate is null || @ApplicableDate <= EndDate)

由于日期范围可能很大,因此可能会返回表的大部分内容(20%-50% 的行)。

该查询以简单的方式表示我想要的行,但性能相当糟糕。无论我添加什么索引,它都会执行聚集索引扫描。我尝试过:

  • StartDate
  • StartDate、EndDate

如何提高此查询的性能?


我已经查看了 这个问题这个也是如此。这些解决方案对我的情况没有帮助 - 我真的不想通过创建一个单独的日期表来将查询转换为相等查询(修改结束日期或为空时会发生什么?)来破坏业务数据。 ),或者通过改变数据以适应空间索引。

尽管如此,我仍然愿意对数据结构进行可能的修改(特别是如果它们不添加行,并且不使用奇怪的数据类型)。

Here's a table:

CREATE TABLE Meetings
(
  ID int PRIMRY KEY IDENTITY(1,1)
  StartDate DateTime NOT NULL,
  EndDate DateTime NULL,
  Field1 varchar(50),
  Field2 varchar(50),
  Field3 varchar(50),
  Field4 varchar(50)
)

There's several thousand rows. The data ranges can be varying sizes (from a couple days up to 50 years).

Here's a query:

DECLARE @ApplicableDate DateTime

SELECT ID, StartDate, EndDate, Field1, Field2, Field3, Field4
FROM Meetings
WHERE StartDate <= @ApplicableDate AND
  (EndDate is null || @ApplicableDate <= EndDate)

Since the date ranges can be large, a large portion of the table might be returned (20%-50% of the rows).

The query represents the rows I want in a simple way, but the performance is pretty bad. It does a clustered index scan, no matter what indexes I add. I've tried:

  • StartDate
  • StartDate, EndDate

How can I improve the performance of this query?


I've reviewed the answers for this question and this one too. Those solutions aren't helpful in my situation - I don't really want to muck with the business' data by creating a separate table of Dates to turn the query into an equality query (what happens when end date is modified, or null?), or by morphing the data to fit in a spatial index.

Still, I'm open to possible modifications to the data structure (particular if they do not add rows, and do not use strange data types)..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

千笙结 2024-09-19 21:40:45

我假设您使用的是 SQL Server 来了解我的语法。

将 ID 上的主键设置为非聚集索引。

ID int PRIMARY KEY NONCLUSTERED IDENTITY(1,1),

在 StartDate 列上创建聚集索引。

CREATE CLUSTERED INDEX ix_Meetings_StartDate
ON Meetings (StartDate)

按原样尝试您的查询。尽管数据的存储方式可能与集群 PK 的存储方式类似,但现在查询引擎将提前知道数据是按开始日期集群的。

I'm assuming you're on SQL Server for my syntax.

Make your primary key on ID a non-clustered index.

ID int PRIMARY KEY NONCLUSTERED IDENTITY(1,1),

Create a clustered in index on the StartDate column.

CREATE CLUSTERED INDEX ix_Meetings_StartDate
ON Meetings (StartDate)

Try your query as is. Even though the data is probably stored similarly to what you had with the clustered PK, now the query engine will know in advance that the data is clustered by the start date.

咿呀咿呀哟 2024-09-19 21:40:45

如果查询返回 20%-50% 的记录,那么扫描多次是最佳选择。如果你有一个索引,你总是必须找到索引中的数据,然后索引中包含表中的记录地址,然后你必须从磁盘中获取包含该记录的页,这有索引中相邻记录被扩散的风险整个磁盘。

如果您确实需要那么多记录并且性能很差,那么也许检查以下内容:

  • 磁盘速度是否有问题?
  • 是网络带宽吗?
  • RAM/Cache 是否受到限制?

If the query returns 20%-50% of the records, then a scan is many times the best option. If you have an index, you always have to find the data in the index,which then contains a record address in the table, and you then have to get the page containing this record from disk, risking that adjacent records in the index are spread all over the disk.

If you really need that many records and performance is bad, then maybe check the following:

  • Is the disk speed an issue?
  • Is it the network bandwidth?
  • Are you restricted in RAM/Cache?
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文