如何证明在 SQL 中使用子选择查询会降低服务器性能
我的工作之一是维护我们的数据库,通常我们在获取报告和使用该基础工作时会遇到性能不足的问题。
当我开始查看 ERP 发送到数据库的查询时,我看到主查询中有很多完全不必要的子选择查询。
由于我不是我们使用的程序的创建者开发人员的成员,因此当我批评他们的代码和工作时,他们不太喜欢。假设他们不把我的评论视为严肃的陈述。 所以我问你几个关于 SQL 中子选择的问题 子
选择是否比左外连接花费更多时间?
是否存在任何博客、文章或任何我子选择的内容建议不要使用的内容?
我如何证明如果我们避免在查询中使用 subselesct,查询将会更快?
我们的数据库服务器是MSSQL2005
One of my jobs it to maintain our database, usually we have troubles with lack of performance while getting reports and working whit that base.
When I start looking at queries which our ERP sending to database I see a lot of totally needlessly subselect queries inside main queries.
As I am not member of developers which is creator of program we using, they do not like much when I criticize they code and job. Let say they do not taking my review as serious statements.
So I asking you few questions about subselect in SQL
Does subselect is taking a lot of more time then left outer joins?
Does exists any blog, article or anything where I subselect is recommended not to use ?
How I can prove that if we avoid subselesct in query that query is going to be faster ?
Our database server is MSSQL2005
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
根据我自己的经验,这两种方法都是有效的,例如 EXISTS 子选择可以通过提前中断来避免大量处理。
但大多数时候,带有大量子查询的查询是由开发人员完成的,他们并不真正理解 SQL,并使用他们经典的过程程序员思维方式来进行查询。然后他们甚至不考虑连接,并进行一些糟糕的查询。所以我更喜欢连接,并且总是检查子查询。老实说,我跟踪慢速查询,并且我对包含子选择的慢速查询的第一次尝试是尝试进行连接。工作时间很多。
但是没有任何规则可以确定子选择是坏的或者比连接慢,只是糟糕的 sql 程序员经常做子选择:-)
From my own experience both methods can be valid, as for example an EXISTS subselect can avoid a lot of treatment with an early break.
Buts most of the time queries with a lot of subselect are done by devs which do not really understand SQL and use their classic-procedural-programmer way of thinking on queries. Then they don't even think about joins, and makes some awfull queries. So I prefer joins, and I always check subqueries. To be completly honnest I track slow queries, and my first try on slow queries containing subselects is trying to do joins. Works a lot of time.
But there's no rules which can establish that subselect are bad or slower than joins, it's just that bad sql programmer often do subselects :-)
这取决于子选择和左外连接。
一般来说,这个结构:
比这个更有效:
请参见此处:
我会避开那些盲目建议避免子选择的博客。
它们的实现是有原因的,不管你信不信,开发人员已经付出了一些努力来优化它们。
编写一个不带子选择的查询,这样运行速度更快。
如果您在这里发布您的查询,我们可能能够对其进行改进。然而,带有子选择的版本可能会更快。
This depends on the subselect and left outer joins.
Generally, this construct:
is more efficient than this:
See here:
I would steer clear of the blogs which blindly recommend to avoid subselects.
They are implemented for a reason and, believe it or not, the developers have put some effort into optimizing them.
Write a query without the subselects which runs faster.
If you post your query here we possibly will be able to improve it. However, a version with the subselects may turn out to be faster.
尝试重写一些查询以消除子选择和比较运行时间。
分享并享受。
Try rewriting some of the queries to elminate the sub-select and compare runtimes.
Share and enjoy.
“展示,而非讲述”- 检查并比较使用 SQL Profiler 识别的查询的查询计划。特别要注意表扫描和书签查找(您希望尽可能频繁地看到索引查找)。查询计划的“拟合优度”取决于最新的统计信息、定义的索引以及整体查询工作负载。
执行计划基础知识
了解更复杂的查询计划
使用 SQL Server Profiler(2005 版)
在 SQL Server Management Studio (SSMS) 中运行查询并打开查询 -> 包含实际执行计划 (CTRL+M)
认为自己很幸运,它们只是子选择(在某些情况下,优化器将生成等效的“连接”)计划')而不是相关子查询!
识别执行大量逻辑读取的查询,使用您喜欢的技术重新编写它,然后通过比较显示其执行的逻辑读取数量。
这是一个提示。要获取执行的逻辑读取总数,请将有问题的查询包装为:
运行查询,然后切换到结果窗格中的消息选项卡。
如果您有兴趣了解更多信息,没有比 SQL Server 2008 查询性能调优更好的书了Distilled,涵盖了监控、解释和修复性能问题的基本技术。
"Show, Don't Tell" - Examine and compare the query plans of the queries identified using SQL Profiler. Particularly look out for table scans and bookmark lookups (you want to see index seeks as often as possible). The 'goodness of fit' of query plans depends on up-to-date statistics, what indexes are defined, the holistic query workload.
Execution Plan Basics
Understanding More Complex Query Plans
Using SQL Server Profiler (2005 Version)
Run the queries in SQL Server Management Studio (SSMS) and turn on Query->Include Actual Execution Plan (CTRL+M)
Think yourself lucky they're only subselects (which in some cases the optimiser will produce equivalent 'join plans') and not correlated sub-queries!
Identify a query that is performing a high number of logical reads, re-write it using your preferred technique and then show how few logicals reads it does by comparison.
Here's a tip. To get the total number of logical reads performed, wrap a query in question with:
Run your query, and switch to the messages tab in the results pane.
If you are interested in learning more, there is no better book than SQL Server 2008 Query Performance Tuning Distilled, which covers the essential techniques for monitoring, interpreting and fixing performance issues.
您可以做的一件事是加载 SQL Profiler 并向它们显示子查询的成本(就 CPU 周期、读取和写入而言)。很难与冷酷无情的统计数据争论。
我还会检查这些查询的查询计划,以确保使用适当的索引,并将表/索引扫描保持在最低限度。
一般来说,如果使用正确并且适当的索引到位,我不会说子查询不好。
One thing you can do is to load SQL Profiler and show them the cost (in terms of CPU cycles, reads and writes) of the sub-queries. It's tough to argue with cold, hard statistics.
I would also check the query plan for these queries to make sure appropriate indexes are being used, and table/index scans are being held to a minimum.
In general, I wouldn't say sub-queries are bad, if used correctly and the appropriate indexes are in place.
我对 MSSQL 不太熟悉,因为我们在大多数应用程序中都使用 postrgesql。但是,应该存在类似“EXPLAIN”的内容,它会向您显示查询的执行计划。在那里,您应该能够看到查询将生成的各个步骤,以便检索所需的数据。
如果您看到有大量的表扫描或循环连接而没有任何索引使用,那么这绝对是查询执行缓慢的提示。使用这样的工具,您应该能够比较两个查询(一个带有连接,另一个没有)
很难说明哪种方法更好,因为它实际上很大程度上取决于优化器在各种情况下可以采用的索引根据 DBMS,优化器可能能够将子查询隐式重写为连接查询并执行它。
如果你真的想展示哪个更好,你必须执行两者并测量时间、CPU 使用率等。
更新:
可能是 MSSQL 的这个 -->QueryPlan
I'm not very familiar with MSSQL, as we are using postrgesql in most of our applications. However there should exist something like "EXPLAIN" which shows you the execution plan for the query. There you should be able to see the various steps that a query will produce in order to retrieve the needed data.
If you see there a lot of table scans or loop join without any index usage it is definitely a hint for a slow query execution. With such a tool you should be able to compare the two queries (one with the join, the other without)
It is difficult to state which is the better ways, because it really highly depends on the indexes the optimizer can take in the various cases and depending on the DBMS the optimizer may be able to implicitly rewrite a subquery-query into a join-query and execute it.
If you really want to show which is better you have to execute both and measure the time, cpu-usage and so on.
UPDATE:
Probably it is this one for MSSQL -->QueryPlan