WHERE 子句中的子查询查询始终超时
我有以下查询(为清楚起见略有修改):
CREATE PROCEDURE Kctc.CaseTasks_GetCaseTasks
@CaseNumber int
... other parameters
,@ChangedBefore datetime
,@ChangedAfter datetime
AS
SELECT Kctc.CaseTasks.CaseTaskId
...blah blah blah
FROM Kctc.CaseTasks
... some joins here
WHERE
... some normal where clauses
AND
(
(@ChangedAfter IS NULL AND @ChangedBefore IS NULL)
OR
EXISTS (SELECT *
FROM Kctc.FieldChanges
WHERE Kctc.FieldChanges.RecordId = Kctc.CaseTasks.CaseTaskId AND
Kctc.FieldChanges.TableName = 'CaseTasks' AND
Kctc.FieldChanges.DateOfChange BETWEEN
ISNULL(@ChangedAfter, '2000/01/01') AND
ISNULL(@ChangedBefore, '2050/01/01'))
)
每当用户指定 @ChangedBefore
或 @ChangedAfter
的值时,此查询就会超时,因此调用子查询。
子查询检查名为 FieldChanges
的表中是否存在记录(它有效地记录 CaseTasks
表中每个字段的更改)。
查询 FieldChanges
效率不高,因为它涉及对未索引的文本字段 TableName
进行过滤。而且我知道子查询本质上是低效的。
所以我的一般问题是,有没有办法重新设计查询以使其性能更好?
当存在多个关联的 FieldChanges 时(即保留 EXISTS 语义),我想不出一种方法将子查询表示为联接,同时仍然只返回一个 CaseTask 行。我尚未对 FieldChanges
表的 TableName
字段建立索引,因为我对对文本字段建立索引犹豫不决。
那我该怎么办呢?
I have the following query (slightly amended for clarity):
CREATE PROCEDURE Kctc.CaseTasks_GetCaseTasks
@CaseNumber int
... other parameters
,@ChangedBefore datetime
,@ChangedAfter datetime
AS
SELECT Kctc.CaseTasks.CaseTaskId
...blah blah blah
FROM Kctc.CaseTasks
... some joins here
WHERE
... some normal where clauses
AND
(
(@ChangedAfter IS NULL AND @ChangedBefore IS NULL)
OR
EXISTS (SELECT *
FROM Kctc.FieldChanges
WHERE Kctc.FieldChanges.RecordId = Kctc.CaseTasks.CaseTaskId AND
Kctc.FieldChanges.TableName = 'CaseTasks' AND
Kctc.FieldChanges.DateOfChange BETWEEN
ISNULL(@ChangedAfter, '2000/01/01') AND
ISNULL(@ChangedBefore, '2050/01/01'))
)
This query times out whenever the user specifies values for @ChangedBefore
or @ChangedAfter
, therefore invoking the subquery.
The subquery checks for the existence of a record in the table called FieldChanges
(which effectively records changes to every field in the CaseTasks
table).
Querying FieldChanges
isn't very efficient because it involves filtering on the text field TableName
which isn't indexed. And I know that subqueries are inherently inefficient.
So my question in general is, is there a way to redesign the query so that it performs better?
I can't think of a way to express the subquery as a join while still returning just one CaseTask
row when there are multiple associated FieldChanges
(i.e. preserving the EXISTS semantic). I haven't yet indexed the TableName
field of the FieldChanges
table because I'm hesitant about indexing text fields.
So what should I do?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
作为第一步,您可以尝试在表 Kctc.FieldChanges 的 RecordId、TableName 和 DateOfChange 字段上放置一个索引(一个包含所有三个字段的索引),看看是否有帮助。
分享并享受。
As a first cut you might try putting an index on the table Kctc.FieldChanges on the RecordId, TableName, and DateOfChange fields (one single index with all three fields) and see if that helps.
Share and enjoy.
我的第一直觉是限制结果集
更改,
然后查看 where 子句中字段的索引
编辑:关于 TOP 1 - 可能不会带来那么多好处,但不应该造成伤害,并且可能有助于避免表扫描。使用单个字段而不是 * 应该只返回该列(我假设它不是 NULL 值列)
其他想法:声明并设置本地值而不是多次处理的 ISNULL 事物:
与之前执行相同的操作,然后使用
另一件事:检查 WHERE xxx AND 的序列 - 使用最有可能的候选者来隔离序列中的第一个,无论是哪个条件,这样它就可以更快地退出。如果是RecordId,则查找,如果TableName更好,则先使用它。如果一列也有索引,则认为其他条件都相同。
My first instinct is to limit the result set
changed to
then look at the index on the fields in the where clause
EDIT: Regarding the TOP 1 - likely not giving that much of a benefit, but should not hurt, and might help avoid a table scan. Using a single field instead of * should only return that column (I assume it is NOT a NULL value column here)
Additional thoughts: Declare and Set a local value instead of the ISNULL thing which gets processed multiple times:
do same with before, then use
ONE MORE THING: Check the sequence of the WHERE xxx AND - use the MOST LIKELY candidate to isolate FIRST in the sequence, whichever condition that is, so it can get out faster. If that is the RecordId, then find, if the TableName is better, use that first instead. If one column also has an index already consider that one all else being equal.
这不会是一个“好的”解决方案,但它可能比现在发生的情况更好:
具体取决于查询计划是什么 - 如果它重复执行子查询,这个公式可能会帮助。
This is not going to be a "good" solution, but it might be better than what's happening right now:
Depends on exactly what the query plan is - if it was performing the subquery repeatedly, this formulation might help.
将
SET ARITHABORT ON
添加到存储过程使其在不到 1 秒的时间内执行。我不知道这意味着什么。大概是“别再胡言乱语了”。
Adding
SET ARITHABORT ON
to the stored procedure got it to execute in less than 1 second.I have no idea what it means. Presumably 'stop arsing about'.