SQL Server存储过程查询优化
我有一个为报告生成数据的应用程序,如下所示:
age < 30 | age >=30 | asian | hispanic
-----------------------------------------------------------------
clients in prog A | | |
-----------------------------------------------------------------
clients in prog B | | |
-----------------------------------------------------------------
number clients | | |
-----------------------------------------------------------------
number children | | |
查询有时非常非常长,我想优化它们。
我没有服务器上运行查询分析器的权限(而且我读到,最好不使用它的建议)。最长的存储过程大约需要 35 秒才能执行。
仔细阅读,高查询优化要避免的事情是:
- Select *
- exists
- distinct
- Cursors
- having
我对手头的任务有几个问题:
- 通过将 Select * 更改为 Select colA, colB ... ,我会看到有多大的差异?真的值得这么麻烦吗?
- 如果存在(...),我该如何优化? if( Select Count(query ) > 0 ) 是一个很好的优化吗?
- 如果我真的要返回表中的所有列,可以使用 Select * 吗?
我不想发布这些查询,因为它们又长又糟糕,但是您还可以提供哪些其他建议?我正在尝试尽可能使用可重用的函数和临时表,以减轻我的大脑和服务器上的压力。
I've got an application that generates data for reports that look like:
age < 30 | age >=30 | asian | hispanic
-----------------------------------------------------------------
clients in prog A | | |
-----------------------------------------------------------------
clients in prog B | | |
-----------------------------------------------------------------
number clients | | |
-----------------------------------------------------------------
number children | | |
The queries are sometimes very very long, and I'd like to optimize them.
I don't have permissions on the server to run the query analyzer (and I read that it's often better not to use it's suggestions). The longest sprocs take ~35 seconds to execute.
Reading around, the things to avoid for high query optimization are :
- Select *
- exists
- distinct
- cursors
- having
I have a few questions about the task at hand:
- how much of a difference am I looking at by changing Select * into Select colA, colB ... ? Is it really worth the trouble?
- how can I optimize if exists( ... )? Is if( Select Count(query ) > 0 ) a good optimization?
- If I am really going to return all of the columns in a table, is it okay to use Select * ?
I don't want to post these queries because they are so long and terrible, but what other suggestions might you be able to offer? I'm trying to use re-usable functions and temporary tables wherever possible to ease the strain both on my brain and on the server.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
1) 通过将 Select * 更改为 Select colA, colB ... ,我看到了多少差异?真的值得这么麻烦吗?
这可能会产生很大的差异 - 通常指定您想要的字段并且仅指定这些字段始终是一种很好的做法。即,如果您执行 SELECT * 返回 50 个字段,而您只需要其中 2 个字段,并且这 2 个字段包含在合适的索引中,则可以从索引提供所有数据,而无需从索引中查找其余数据数据页。所以这要好得多。
2)如果存在(...),我该如何优化? if( Select Count(query ) > 0 ) 是一个很好的优化吗?
不...SELECT COUNT() 更糟糕。 EXISTS 是执行此类操作的最高效方法,因为它经过优化,一旦找到第一个匹配记录就停止检查。而 COUNT() 将继续下去,直到找到所有不必要的数据。说实话,我根本不会将游标归入坏阵营中的“EXISTS”。
3)如果我真的要返回表中的所有列,可以使用 Select * 吗?
好吧,如果你真的想要所有这些,那就没有那么重要了。假设如果您想在将来添加更多列,那么您还希望返回这些列,如果现有代码突然更改,则可能会破坏现有代码。
1) how much of a difference am I looking at by changing Select * into Select colA, colB ... ? Is it really worth the trouble?
That can make quite a big difference - it's always good practice generally to specify the fields you want and ONLY those fields. i.e. if you do a SELECT * to return 50 fields when you only need 2 of them, and those 2 fields are included in a suitable index then all the data can be provided from the index without having to look up the rest of the data from the data pages. So this is much better.
2) how can I optimize if exists( ... )? Is if( Select Count(query ) > 0 ) a good optimization?
No...SELECT COUNT() is worse. EXISTS is the most performant way to do this kind of thing as it is optimised to stop checking as soon as it finds the first matching record. Whereas COUNT() will keep going til it's found them all which is unnecessary. I wouldn't be classing "EXISTS" in the bad camp with cursors at all tbh.
3) If I am really going to return all of the columns in a table, is it okay to use Select *?
Well, if you truly want them all then it doesn't matter as much. That assumes if you want to add more columns in future then you also want those to also be returned which could break existing code if it suddenly changes.
您可以在此处发布查询
吗?这里只是一些指针,因为您没有显示任何代码
,一般来说,exists 比 count(*) 更快,因为exists 在找到匹配项时返回,其中 count() 将继续,直到到达末尾结果集
select col1, col2 比 select * 更好,因为如果列位于非聚集索引中,则基表/聚集索引甚至不会被触及,既然您已在索引中包含列,则更是如此。如果您仅返回所需的列,您还将使用更少的带宽
如果后来有人向表中添加 4 列怎么办?现在您还将返回这 4 列
Can you post the query
here are just some pointers because you are not showing any code
in general exists is faster then count(*) because exists returns the moment it found a match where count() will continue until it has reached the end of the result set
select col1, col2 is better than select * because if the columns are in a non clustered index then the base table/clustered index won't even be touched, this is even more true now that you have included columns in indexes. you will also use less bandwith if you return only the columns that you need
what if someone adds 4 columns to the table later on? Now you will be returning those 4 columns also
假设您指的是用户定义的函数,它们并不总是有利于性能。寻求缓解大脑的压力可能会增加服务器的压力。纯标量的(即它们获取一个值,操作它并返回另一个值)应该没问题,但是扫描表的那些在直接在存储过程中使用其逻辑时通常可以运行得更快。例如,扫描表 X 查找值 Y 的出现次数并返回计数的函数将比包含可一次性计算每个值计数的连接的 SQL 语句运行得更慢(因为重复调用它)。
您还应该检查相关源表上是否有索引以及它们是否正在使用。
Assuming you mean user-defined functions, they're not always good for performance. Seeking to ease the strain on your brain can come at the expense increasing the strain on the server. Ones that are purely scalar (ie they take a value, manipulate it and return another value) should be fine, but ones that scan tables can usually run quicker when their logic is used in the stored procedure directly. As an example, a function that scans Table X for occurrences of value Y and returns a count will run slower (because of the repeated calls to it) than a SQL statement containing a join that can do every value's count in one go.
You should also check if there are indices on the relevant source tables and whether they are being used.
从 Select * 更改为 Select column1,column2,.... 不会给您带来太大好处。但是,您应该这样做,因为这是很好的编码。如果将来有人更改了列顺序或列数,可能会导致您的报告损坏,具体取决于报告的构建方式。
另一种方法怎么样?如果您能够在表上添加非聚集索引,我建议您进行研究。具体来说,查看现有子查询并查看Where 部分中的列是否有索引。如果它们不这样做,那么每次exists返回false时,您都会进行表扫描,并且即使它返回true,您也可能每次都进行表扫描(这取决于值所在的位置)。非聚集索引将允许子查询快速找到表中的任何结果。有时您必须使用低效的查询,但如果您可以通过索引优化表结构,那么对速度的影响就会小得多。
另外,对于您的 Exists 子查询,您是否最多只能得到 1 个结果?如果是这样,那么您可能想尝试对表进行左连接。如果您不在连接左侧和右侧的列集上建立索引,这可能不会有帮助,但如果您这样做,那应该会非常有帮助,因为您基本上会扫描右侧表一次,而不是每行扫描一次。
You will not get much of a benefit from changing from Select * to Select column1,column2,.... However, you should do it because it is good coding. If someone changed the column order or number of columns in the future it could cause your reports to break depending on how they are built.
How about another approach? If you are able to add non-clustered indexes on your tables I would suggest looking into that. Specifically, look at your exist sub-queries and see if the columns that are in the Where section have an index on them. If they do not then you will be doing a table scan every time the exists returns false and you could be doing up to a table scan every time even if it returns true (it depends on where the value is at). The non-clustered indexes will allow the sub-queries to quickly find any results in your table. Sometimes you have to use inefficient queries but if you can optimize your table structure through indexes then it makes much less of an impact on your speed.
Also, for your Exists sub-queries is it ever the case that you will have at most 1 result? If so then you might want to try doing a left join to the table. That probably won't help if you do not an index on both the column sets on the left and right of your join but if you do it should be pretty helpful as you would basically scan your right hand table 1 time instead of once per row.
对于计数,最有效的形式是 SELECT Count(1) FROM table。 (或者 0 或 123 或任何简单的常量值)。
为了便于管理,您也应该更改为 SELECT field1, field2,..。 SELECT * 速度较慢,以后当代码、视图或表(或更多)发生更改时,您可能会遇到问题。
For counting, the most effective form is SELECT Count(1) FROM table. (Or 0 or 123 or any simple constant value).
You should change to SELECT field1, field2,.. for manageability, too. SELECT * is slower, and later you may run into problems when code, views or tables (or more of them) change.