数据库查询有规范形式吗?
假设我想制作一个“优化的查询生成器”。 基本上是一个 SQL 查询优化器,它比基于时间/空间限制的 SQL 服务器中的查询优化器要好得多。 它将查询和数据库统计信息作为输入,并生成为目标系统量身定制的 SQL 查询,该查询将快速优化为近乎理想的计划。
需要支持多少 SQL?是否有一个 SQL 子集足够灵活,可以轻松描述最有用的查询,但又比完整的 SQL 足够小,值得对其进行缩减? 另外如果您不需要“靠近机器”,是否有更好的方法来描述查询?
我不是在考虑一个可以处理现有SQL的程序,而是一个工具用于创建新的 SQL。 实际上不需要 SQL 作为输入,只要输入语言能够描述查询的要求即可。
我想问题的另一种形式是:它们的 SQL 部分是否只是为了性能而从不提高可读性/可理解性?
正如有人指出的那样,这样做将需要“大量特定于产品的知识”,并且(例如,嵌套子查询与其他查询,应该使用什么样的索引,诸如此类的事情)正是该工具想要封装的内容这样用户就不需要学习这些知识了。
注意:我对生成实际的查询计划不感兴趣,因为这是 DBMS 的工作并且无论如何都无法通过 SQL 完成。 我感兴趣的是一个系统,它可以自动为给定的 DBMS 生成良好的 SQL,而无需针对该 DBMS 进行调整。
Say I want to make an "Optimized query generator". Basically a SQL query optimizer that is a lot better than what can be put in an SQL server based on time/space limitations. It would take a query and DB stats as input and generate an SQL query tailored for the target system that will quickly optimize to a nearly ideal plan.
How much of SQL would need to be supported? Is there a subset of SQL that is flexible enough to easily describe most useful queries but enough smaller than full SQL to make it worth trimming it down to? Also is there a better way to describe queries if you don't need to stick "close to the machine"?
I'm not thinking of a program that you would process existing SQL through but rather a tool for creating new SQL from. It wouldn't actual need to take SQL as input as long as the input language is able to describe the requirements of the query.
I guess another form of the question would be: are their any parts of SQL that are only there for performance and never improve readability/understandability?
As pointed out by someone doing this would require "tons of product-specific knowledge" and that (e.g. nested sub queries vs whatever, what kind of indexes should be used, that sort of thing) is exactly what the tool would be intended to encapsulate so that the user doesn't need to learn that knowledge.
note: I am not interested in generating actual query plans as that is the DBMS's job and can't be done from SQL anyway. I'm interested in a system that can automate the job of making good SQL for a given DBMS from an input that doesn't need to be tuned for that DBMS.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
到目前为止,这是一个非常古老的问题,我同意大多数其他答案,即它可能有点误导。 但这是有道理的。 您读过 Gulutzan 和 Pelzer 的“SQL 性能调优”(Addison-Wesley,2003 年)吗? 它比较了许多 DBMS 以及等效但不同形式的查询如何影响执行时间。 换句话说,查询优化器中存在哪些特性和错误。
例如,他们发现在大多数系统中,WHERE 子句(例如
WHERE column1 = 'A' AND column2 = 'B'
)将从左到右计算,但在 Oracle 中从右到左(在某些条件,以及他们撰写本书时当前的特定 Oracle 版本)。 因此,最不可能的条件在 Oracle 中应该放在最后,但在大多数其他系统中应该放在前面。This is a very old question by now, and I agree with most of the other answers that it is perhaps a bit misguided. But there is something to it. Have you read Gulutzan and Pelzer's "SQL Performance Tuning" (Addison-Wesley, 2003)? It compares a number of DBMSs and how equivalent but differently formulated queries impact the execution time. In other words, which idiosyncrasies and bugs exist in the query optimizers.
For example, they found that in most systems a WHERE clause such as
WHERE column1 = 'A' AND column2 = 'B'
will be evaluated from left to right, but from right to left in Oracle (under certain conditions, and in the particular version of Oracle that was current when they wrote the book). Therefore, the least likely condition should be put last in Oracle, but first in most other systems.我很惊讶地听到您将 SQL 描述为“接近机器”。 SQL 本身是声明性的而不是过程性的,关系数据库有趣的方面之一是实现者必须自由创新,因为 SQL 本身很少规定如何执行查询。
我认为就纯粹的实用性而言,改进 SQL 是非常困难的。 我并不是说它是完美的语言,但它是关系(甚至一些非关系)数据库的通用语言。
I'm surprised to hear you describe SQL as "close to the machine". SQL itself is declarative rather than procedural, and one of the interesting aspects of relational databases is the freedom implementers have to innovate, since SQL itself dictates very little about how the queries should be executed.
I think for sheer utility, it would be very difficult to improve on SQL. I'm not saying it's the perfect language, but it is the lingua franca of relational (and even some non-relational) databases.
布拉姆哈,我不确定你是否知道你在问什么。 SQL 优化不仅仅是确保查询组件的顺序正确。 您似乎认识到您需要对索引、数据页布局等有深入的了解,但您仍然只能重新排序查询子句,除非您获得 SQL Server 查询的适当“挂钩”处理器。 因为这就是 MS 所做的 - 它本质上将查询“编译”到更深、更基本的级别以优化数据访问。
Bramha, I'm not sure if you know what you are asking. SQL Optimization isn't simply a matter of making sure that query components are in the right order. You seem to recognize that you'll need to have intimate knowledge of the indices, data page layouts, etc. etc. but you'd still be left with just reording query clauses unless you gain the appropriate "hooks" into the SQL Server query processor. Because that is what MS does - it essentially "compiles" queries down into a deeper, more fundamental level to optimize the data access.
嗯……有(我想,懒得去谷歌搜索)九个关系运算符(扫描、跳转、散列合并等)用于构造 SQL 查询的执行计划。 运算符的选择基于目标数据库表的使用统计信息、可用索引等。
听起来您正在尝试重新创建查询规划器已经执行的操作......?
编辑:
这个问题我很困惑; 这看起来就像重新发明轮子,但没有马车可以安装它!?
umm...there are (I think, too lazy to google it) nine relational operators (scan, jump, hash-merge, etc.) that are used to construct the execution plan of a SQL query. The choice of operators is based on the usage statistics of the target database tables, available indices, et al.
It sounds like you're trying to recreate what the query planner already does...?
EDIT:
I am very confused by this question; it looks like reinventing the wheel but with no wagon to mount it on!?
您可能会发现“SQL Queries for Mere Mortals”中的模式很有用,因为它们通过从英语描述开始的结构化规范格式工作。
如果您想快速浏览一下,请访问 Safari。
You might find the patterns in "SQL Queries for Mere Mortals" useful as they work through a structured canonical format starting with English descriptions.
Online at Safari, if you want to take a quick peek.
您打算为单个特定数据库引擎编写此代码吗? 如果没有,我怀疑你会度过一段相当困难的时期。 数据库查询的优化在很大程度上依赖于引擎的实现和内部结构的具体细节,以及表、索引、主/外键关系、数据的类型和分布等。创建优化查询的实际逻辑是不同数据库引擎之间可能很少有重叠。 (就此而言,至少对于 MySQL,表类型会对优化产生巨大影响。)每个受支持的数据库引擎的每个版本也可能具有显着不同的特征 - 请记住,如果您正在生成 SQL,那么您需要能够预测引擎自己的优化器/查询规划器将如何处理您生成的 SQL。
问题是,查询优化对关系理论的依赖程度很弱,而很大程度上依赖于对数据库内部结构和所保存数据的详细了解。 即使您能够提取数据库的元数据,我怀疑您将很难生成比数据库本身更好的查询计划 - 如果您没有获取数据库的元数据,那么您的事业是无望的。
Is your intent to write this for a single specific database engine? If not, I suspect that you'll have a rather difficult time of this. Optimization of database queries relies heavily on the exact specifics of the engine's implementation and internals, as well as the tables, indexes, primary/foreign key relations, type and distribution of data, etc, etc. The actual logic of creating an optimized query would likely have very little overlap between different database engines. (For that matter, at least for MySQL the table type would make a huge difference on optimizations.) Each release of each supported DB engine may have significantly different characteristics, as well -- keep in mind that if you're generating SQL, then you need to be able to predict how the engine's own optimizer/query planner will handle the SQL you've generated.
The thing is, query optimization relies only weakly on relational theory, and very heavily on detailed knowledge of the DB's guts and the data being held. Even if you're able to extract the DB's metadata, I suspect that you'll have a difficult time producing a better query plan than the DB itself would -- and if you're not getting the DB's metadata, then your cause is hopeless.
祝你好运 - 您选择与 Microsoft 和 Oracle 等公司竞争,这些公司的生死存亡取决于他们的查询优化器是否完全按照您的建议执行。 将一个数据库产品与另一个数据库产品进行比较的第一个也是主要的方法是进行基准测试,其中对每个数据库产品应用相同的查询工作负载,进行计时测量,并且在大多数情况下,获胜者由执行速度决定。
如果您使用发行商的产品在这些基准测试中比发行商做得更好,世界将会留下深刻的印象。 至少无论您使用哪一个,您都会有一个可靠的职业机会。
Good luck - you've chosen to compete with such companies as Microsoft and Oracle, who live or die by how well their query optimizers do exactly what you propose. The first and primary way to compare one database product with another is with benchmark testing, where the same query workload is applied to each of them, timing measurements are taken, and the winner in most cases is determined by speed of execution.
The world will be impressed if you can do significantly better than the publisher on any of these benchmarks, using their products. At least you'll have a solid career opportunity with whichever one(s) you use.