SQL 优化和析取范式
所以我在 Visual Studio 2010 中编写一个查询(我的意思是我打开服务器资源管理器,右键单击服务器并选择“新建查询”)。 条件
A AND B AND C AND D AND E AND F AND (G OR H)
该查询包括连接范式(CNF)的 。当我运行查询(附加到 MSSQL Server 2008)时,它将文本更改为
A AND B AND C AND D AND E AND F AND G OR
A AND B AND C AND D AND E AND F AND H
析取范式(DNF)。
从我在网上发现的一点来看,DNF 似乎允许 SQL 单独运行连接词并在最后将它们联合起来。
但是,像这样的事情,这么多的重复条件,DNF真的比CNF有优势吗?如果没有,我如何强制优化器按原样采用条件?如果是这样,我应该以 CNF 形式在应用程序代码中编写查询,因为它更短、更简洁,还是以 DNF 形式编写查询,因为它可以节省优化器的时间?
So I was writing a query in Visual Studio 2010 (by which I mean I opened the server explorer, right clicked the server and chose New Query). The query includes the condition
A AND B AND C AND D AND E AND F AND (G OR H)
which is conjunctive normal form (CNF). When I ran the query(attached to MSSQL Server 2008), it changed the text into
A AND B AND C AND D AND E AND F AND G OR
A AND B AND C AND D AND E AND F AND H
which is disjunctive normal form (DNF).
From the little I found on-line, it seems like DNF allows SQL to run the conjunctives separately and union them at the end.
However, for something like this, with so many repeated conditions, does DNF actually provide an advantage over CNF? If it doesn't, how can I force the optimizer to take the condition as is? If it does, should I write the query in my application code in CNF form because it's shorter and neater or in DNF form because it saves time for the optimizer?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不知道DNF/CNF在这种情况下的相对优势,甚至不知道如何以这种方式强制优化器。
一般来说,您不想强制优化器采用您的“感知”、“当前”优化而不是它将生成的优化(也有例外,但通常很少见)。这很大程度上与以下事实有关:“最佳”优化可能会随着时间的推移而改变,作为其他操作(例如添加索引)的副作用。如果您强制优化器采用特定的优化,则您会将其锁定在该路径中,即使新的优化可能会表现更好。
鉴于此,您应该以最容易阅读和维护的形式 (CNF) 编写查询,并让优化器在必要时更改它 - 这就是 SQL 作为声明性语言的全部要点,允许优化器根据需要进行处理。
I don't know about the relative advantages of DNF/CNF in this situation, or even how to force the optimizer in this fashion.
Generally speaking, you don't want to force the optimizer to take your 'perceived', 'current', optimization over the one it will generate (there are exceptions to this, but these are usually rare). This largely has to do with the fact that the 'best' optimization may change over time, as a side effect of other actions (like adding an index). If you're forcing the optimizer to adopt a particular optimization, you're locking it into that path, even if a new one may perform better.
Given that, you should write the query in the form that is easiest to read and maintain (CNF), and let the optimizer change it if necessary - this is the whole point of SQL being a declarative language, to allow the optimizer to muck with things as necessary.
我突然想到 G 或 H 上的索引。如果 G 被索引,但 H 没有……也许析取会更有意义。
无论如何,您可以自己运行性能分析器来查看性能的净差异。
除此之外,如果您想深入研究,您可能可以访问以下一些研究:研究材料:http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=842265&abstractAccess=no&userType=inst
Off the top of my head, I wonder about the indexing on G or H. If G was indexed, but H wasn't...maybe a disjunctive would make more sense.
anyway, you can run the perfmance analyzer yourself to see the net difference in performance.
Besides that, here is some research you might be able to get access to if you want to dive-in: Reserach Material: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=842265&abstractAccess=no&userType=inst