SQL查询执行快捷方式OR逻辑?
我有三个表:
SmallTable
(id int, flag1 bit, flag2 bit)
JoinTable
(SmallTableID int, BigTableID int)
BigTable
(id int, text1 nvarchar(100), otherstuff...)
SmallTable
最多有几十条记录。 BigTable
有几百万,实际上是一个将该数据库中的表与同一服务器上另一个数据库中的表 UNIONS 的视图。
这是连接逻辑:
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
平均连接大小是几千个结果。显示的所有内容均已编入索引。
对于大多数 SmallTable
记录,flag1
和 flag2
设置为 1
,因此实际上甚至不需要访问BigTable.text1 上的索引,但 SQL Server 无论如何都会这样做,从而导致昂贵的索引扫描和嵌套循环。
有没有更好的方法来提示 SQL Server,如果 flag1
和 flag2
都设置为 1
,它甚至不需要查找在text1
?
实际上,如果我可以在这些情况下完全避免连接到 BigTable(JoinTable 是托管的,因此这不会产生问题),这将使这个关键查询更快。
I have three tables:
SmallTable
(id int, flag1 bit, flag2 bit)
JoinTable
(SmallTableID int, BigTableID int)
BigTable
(id int, text1 nvarchar(100), otherstuff...)
SmallTable
has, at most, a few dozen records. BigTable
has a few million, and is actually a view that UNIONS a table in this database with a table in another database on the same server.
Here's the join logic:
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
Average joined size is a few thousand results. Everything shown is indexed.
For most SmallTable
records, flag1
and flag2
are set to 1
, so there's really no need to even access the index on BigTable.text1, but SQL Server does anyway, leading to a costly Indexed Scan and Nested Loop.
Is there a better way to hint to SQL Server that, if flag1
and flag2
are both set to 1
, it shouldn't even bother looking at text1
?
Actually, if I can avoid the join to BigTable completely in these cases (JoinTable is managed, so this wouldn't create an issue), that would make this key query even faster.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
SQL 布尔计算不保证运算符短路。请参阅 关于 SQL Server 布尔运算符短路 一个清晰的例子展示了假设操作符短路如何导致正确性问题和运行时错误。
另一方面,我的链接中的示例显示了 SQL Server 的作用:提供 SQL 可以使用的访问路径。因此,与所有 SQL 性能问题和疑问一样,真正的问题不在于 SQL 文本的表达方式,而在于存储的设计。 IE。查询优化器可以使用哪些索引来满足您的查询?
SQL Boolean evaluation does NOT guarantee operator short-circuit. See On SQL Server boolean operator short-circuit for a clear example showing how assuming operator short circuit can lead to correctness issues and run-time errors.
On the other hand the very example in my link shows what does work for SQL Server: providing an access path that SQL can use. So, as with all SQL performance problems and questions, the real problem is not in the way the SQL text is expressed, but in the design of your storage. Ie. what indexes has the query optimizer at its disposal to satisfy your query?
不幸的是,我不相信 SQL Server 会短路这样的情况。
所以我建议做两个查询并将它们联合在一起。第一个查询使用 s.flag1=1 和 s.flag2=1 WHERE 条件,第二个查询使用 s.flag1<>1 和 s.flag2>>1 条件连接到 BigTable。
关于此事的这篇文章值得一读,其中包括以下要点:
更新:
这篇文章也是这是一本有趣的读物,包含一些关于该主题的好链接,包括与 SQL Server 查询处理器团队的开发经理的 technet 聊天,其中简要提到优化器确实允许短路评估。我从各种文章中得到的总体印象是“是的,优化器可以发现短路的机会,但你不应该依赖它,也不能强迫它”。因此,我认为 UNION 方法可能是您最好的选择。如果它没有提出一个利用捷径机会的计划,那么这将取决于基于成本的优化器认为它找到了一个不这样做的合理计划(这将取决于索引、统计信息等) 。
I don't believe SQL Server will short-circuit conditions like that unfortunately.
SO I'd suggest doing 2 queries and UNION them together. First query with s.flag1=1 and s.flag2=1 WHERE conditions, and the second query doing the join on to BigTable with the s.flag1<>1 a s.flag2<>1 conditions.
This article on the matter is worth a read, and includes the bottom line:
Update:
This article is also an interesting read and contains some good links on this topic, including a technet chat with the development manager for the SQL Server Query Processor team which briefly mentions that the optimizer does allow short-circuit evaluation. The overall impression I get from various articles is "yes, the optimizer can spot the opportunity to short circuit but you shouldn't rely on it and you can't force it". Hence, I think the UNION approach may be your best bet. If it's not coming up with a plan that takes advantage of an opportunity to short cut, that would be down to the cost-based optimizer thinking it's found a reasonable plan that does not do it (this would be down to indexes, statistics etc).
它并不优雅,但它应该可以工作......
It's not elegant, but it should work...
SQL Server 通常会获取子查询提示(尽管可以随意丢弃它):
SQL Server usually grabs the subquery hint (though it's free to discard it):
不知道如果没有测试数据,这是否会更快...但听起来可能会
发生请让我知道发生了什么
此外,您可以通过仅返回此查询的唯一 id 然后使用结果来加快速度以获得所有其余数据。
编辑
这样的东西?
No idea if this will be faster without test data... but it sounds like it might
Please let me know what happens
Also, you might be able to speed this up by just returning just a unique id for this query and then using the result of that to get all the rest of the data.
edit
something like this?
希望这能起作用 - 小心围绕
aggregates
的case
语句中的快捷逻辑,但是......Hope this works - careful of shortcut logic in
case
statements aroundaggregates
but...