使用游标的优点和缺点(在 SQL Server 中)
我在这里问了一个问题在OLTP数据库(SQL服务器)中使用游标
人们回应说永远不应该使用光标。
我觉得游标是非常强大的工具,应该被使用(我不认为微软支持糟糕的开发人员使用游标)。假设你有一个表,其中一行中的列的值取决于同一列的值在前一行。如果它是一次性后端进程,您不认为使用游标是一个可以接受的选择吗?
我突然想到了一些场景,在这些场景中我觉得使用游标应该没有什么可耻的。如果你们有其他感觉,请告诉我。
- 用于清理不良数据的一次性后端进程,可在几分钟内完成执行。
- 在很长一段时间内运行一次的批处理过程(例如每年一次)。
如果在上述场景中,其他进程没有明显的压力,那么花费额外的时间编写代码来避免游标不是不合理的吗?换句话说,在某些情况下,开发人员的时间比几乎对其他任何事情都没有影响的流程的性能更重要。
在我看来,在这些情况下,您应该认真尝试避免使用游标。
- 从可以经常调用的网站调用的存储过程。
- 每天运行多次并消耗大量资源的 SQL 作业。
我认为在没有分析手头的任务并实际权衡替代方案的情况下做出“永远不应该使用游标”之类的一般性声明是非常肤浅的。
请让我知道您的想法。
I asked a question here Using cursor in OLTP databases (SQL server)
where people responded saying cursors should never be used.
I feel cursors are very powerful tools that are meant to be used (I don't think Microsoft supports cursors for bad developers).Suppose you have a table where the value of a column in a row is dependent on the value of the same column in the previous row. If it is a one time back end process, don't you think using a cursor would be an acceptable choice?
Off the top of my head I can think of a couple of scenarios where I feel there should be no shame in using cursors. Please let me know if you guys feel otherwise.
- A one time back end process to clean bad data which completes execution within a few minutes.
- Batch processes that run once in a long period of time (something like once a year).
If in the above scenarios, there is no visible strain on the other processes, wouldn't it be unreasonable to spend extra hours writing code to avoid cursors? In other words in certain cases the developer's time is more important than the performance of a process that has almost no impact on anything else.
In my opinion these would be some scenarios where you should seriously try to avoid using a cursor.
- A stored proc called from a website that can get called very often.
- A SQL job that would run multiple times a day and consume a lot of resources.
I think its very superficial to make a general statement like "cursors should never be used" without analyzing the task at hand and actually weighing it against the alternatives.
Please let me know of your thoughts.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在某些情况下,游标实际上比基于集合的等效游标表现得更好。人们总是会想到运行总计 - 查找 Itzik 对此的描述(并忽略任何涉及 SQL Server 2012 的内容,它添加了新的窗口函数,在这种情况下可以为游标提供运行机会)。
人们使用游标的一大问题是它们执行缓慢、使用临时存储等。部分原因是默认语法是带有各种低效默认选项的全局游标。下次当你使用光标做一些不需要做诸如 UPDATE...WHERE CURRENT OF 之类的事情(我在整个职业生涯中都能够避免这样做)时,给它通过比较这两个语法选项,可以公平地对待:
事实上,第一个版本代表了未记录的存储过程 sp_MSforeachdb 中的一个错误,如果任何数据库的状态在执行期间发生变化,它就会跳过数据库。随后,我编写了自己版本的存储过程(请参阅 此处)既修复了错误(只需使用上面语法的后一个版本),又添加了几个参数来控制选择哪些数据库。
很多人认为方法论不是游标,因为它没有说
DECLARE CURSOR
。我见过人们认为 while 循环比光标更快(我希望我已经在这里消除了)或使用FOR XML PATH
执行组串联并不执行隐藏光标操作。很多情况下看计划就可以看出真相。在很多情况下,基于集合的游标更合适。但是有很多有效的用例,其中基于集合的等效项编写起来要复杂得多,优化器要为两者生成计划,或者不可能生成计划(例如,循环遍历表以更新统计信息的维护任务,为结果中的每个值调用存储过程等)。对于许多大型多表查询也是如此,其中计划对于优化器来说过于巨大而无法处理。在这些情况下,最好先将一些中间结果转储到临时结构中。对于某些基于集合的游标等效项(例如运行总计)也是如此。我还写过另一种方式,人们几乎总是本能地认为使用 while 循环/游标,并且有 基于集合的聪明替代方案要好得多。
更新2013-07-25
只是想添加一些我写的关于游标的额外博客文章,如果您确实必须使用它们,则应该使用哪些选项,并使用基于集合的查询而不是循环生成集合:
最佳方法运行总计 - 针对 SQL Server 2012 进行了更新
有何影响不同的光标选项可以有吗?
生成一个没有循环的集合或序列:[第 1 部分] [第 2 部分] [第 3 部分]
There are several scenarios where cursors actually perform better than set-based equivalents. Running totals is the one that always comes to mind - look for Itzik's words on that (and ignore any that involve SQL Server 2012, which adds new windowing functions that give cursors a run for their money in this situation).
One of the big problems people have with cursors is that they perform slowly, they use temporary storage, etc. This is partially because the default syntax is a global cursor with all kinds of inefficient default options. The next time you're doing something with a cursor that doesn't need to do things like
UPDATE...WHERE CURRENT OF
(which I've been able to avoid my entire career), give it a fair shake by comparing these two syntax options:In fact the first version represents a bug in the undocumented stored procedure sp_MSforeachdb which makes it skip databases if the status of any database changes during execution. I subsequently wrote my own version of the stored procedure (see here) which both fixed the bug (simply by using the latter version of the syntax above) and added several parameters to control which databases would be chosen.
A lot of people think that a methodology is not a cursor because it doesn't say
DECLARE CURSOR
. I've seen people argue that a while loop is faster than a cursor (which I hope I've dispelled here) or that usingFOR XML PATH
to perform group concatenation is not performing a hidden cursor operation. Looking at the plan in a lot of cases will show the truth.In a lot of cases cursors are used where set-based is more appropriate. But there are plenty of valid use cases where a set-based equivalent is much more complicated to write, for the optimizer to generate a plan for, both, or not possible (e.g. maintenance tasks where you're looping through tables to update statistics, calling a stored procedure for each value in a result, etc.). The same is true for a lot of big multi-table queries where the plan gets too monstrous for the optimizer to handle. In these cases it can be better to dump some of the intermediate results into a temporary structure first. The same goes for some set-based equivalents to cursors (like running totals). I've also written about the other way, where people almost always think instinctively to use a while loop / cursor and there are clever set-based alternatives that are much better.
UPDATE 2013-07-25
Just wanted to add some additional blog posts I've written about cursors, which options you should be using if you do have to use them, and using set-based queries instead of loops to generate sets:
Best Approaches for Running Totals - Updated for SQL Server 2012
What impact can different cursor options have?
Generate a Set or Sequence Without Loops: [Part 1] [Part 2] [Part 3]
SQL Server 中游标的问题在于该引擎内部是基于集合的,这与 Oracle 等内部基于游标的其他 DBMS 不同。这意味着当您在 SQL Server 中创建游标时,需要创建临时存储,并且需要将基于集的结果集复制到临时游标存储。您马上就会明白为什么这会很昂贵,更不用说您可能在游标本身之上进行的任何逐行处理了。最重要的是,基于集的处理效率更高,并且通常使用 CTE 或临时表可以更好地完成基于游标的操作。
话虽如此,在某些情况下,光标可能是可以接受的,正如您所说的一次性操作。我能想到的最常见的用途是在维护计划中,您可能会迭代服务器上的所有数据库来执行各种维护任务。只要您限制使用并且不围绕 RBAR(逐行)处理设计整个应用程序,就应该没问题。
The issue with cursors in SQL Server is that the engine is set-based internally, unlike other DBMS's like Oracle which are cursor-based internally. This means that when you create a cursor in SQL Server, temporary storage needs to be created and the set-based resultset needs to be copied over to the temporary cursor storage. You can see why this would be expensive right off the bat, not to mention any row-by-row processing that you might be doing on top of the cursor itself. The bottom line is that set-based processing is more efficient, and often times your cursor-based operation can be done better using a CTE or temp table.
That being said, there are cases where a cursor is probably acceptable, as you said for one-off operations. The most common use I can think of is in a maintenance plan where you may be iterating through all the databases on a server executing various maintenance tasks. As long as you limit your usage and don't design whole applications around RBAR (row-by-agonizing-row) processing, you should be fine.
一般来说,游标是一件坏事。然而,在某些情况下,使用游标更为实用,而且在某些情况下,使用游标甚至更快。一个很好的例子是光标穿过联系人表,根据某些条件发送电子邮件。 (不要提出从 DBMS 发送电子邮件是否是个好主意的问题 - 让我们假设它是为了解决手头的问题。)没有办法编写基于集合的方法。您可以使用一些技巧来提出基于集合的解决方案来生成动态 SQL,但真正的基于集合的解决方案并不存在。
然而,涉及前一行的计算可以使用自连接来完成。这通常仍然比光标更快。
在所有情况下,您都需要平衡开发更快的解决方案所涉及的工作。如果没有人关心,如果您的进程在 1 分钟或 1 小时内运行,请使用最快完成工作的方法。如果您要循环访问随时间增长的数据集(例如 [orders] 表),请尽可能远离光标。如果您不确定,请在几个显着不同的数据大小上进行性能测试,将游标库与基于集的解决方案进行比较。
In general cursors are a bad thing. However in some cases it is more practical to use a cursor and in some it is even faster to use one. A good example is a cursor through a contact table sending emails based on some criteria. (Not to open up the question if sending an email from your DBMS is a good idea - let's just assume it is for the problem at hand.) There is no way to write that set-based. You could use some trickery to come up with a set-based solution to generate dynamic SQL, but a real set-based solution does not exist.
However, a calculation involving the previous row can be done using a self join. That is usually still faster than a cursor.
In all cases you need to balance the effort involved in developing a faster solution. If nobody cares, if you process runs in 1 minute or in one hour, use what gets the job done quickest. If you are looping through a dataset that grows over time like an [orders] table, try to stay away from a cursor if possible. If you are not sure, do a performance test comparing a cursor base with a set-based solution on several significantly different data sizes.
我一直不喜欢游标,因为它们的性能很慢。然而,我发现我并没有完全理解不同类型的游标,并且在某些情况下,游标是一种可行的解决方案。
当您遇到的业务问题只能通过一次处理一行来解决时,游标就合适了。
因此,要提高游标的性能,请更改正在使用的游标类型。我不知道的是,如果您不指定声明哪种类型的游标,默认情况下您会获得动态乐观类型,这是性能最慢的类型,因为它在幕后做了很多工作。但是,通过将游标声明为不同类型(例如静态游标),它具有非常好的性能。
请参阅这些文章以获得更完整的解释:
关于游标的真相:第一部分
关于游标的真相:第二部分
关于游标的真相:第三部分
我认为反对游标的最大缺点是性能,但是,不以基于集合的方法布置任务可能排名第二。第三是任务的可读性和布局,因为它们通常没有很多有用的注释。
SQL Server 已针对运行基于集合的方法进行了优化。您编写查询以返回数据结果集,例如表上的联接,但 SQL Server 执行引擎决定使用哪个联接:合并联接、嵌套循环联接或哈希联接。 SQL Server 根据参与列、数据量、索引结构以及参与列中的值集来确定最佳的连接算法。因此,与过程游标方法相比,使用基于集合的方法通常是性能最佳的方法。
I had always disliked cursors because of their slow performance. However, I found I didn't fully understand the different types of cursors and that in certain instances, cursors are a viable solution.
When you have a business problem that can only be solved by processing one row at a time, then a cursor is appropriate.
So to improve performance with the cursor, change the type of cursor you are using. Something I didn't know was, if you don't specify which type of cursor you are declaring, you get the Dynamic Optimistic type by default, which is the one that is the slowest for performance because it's doing lots of work under the hood. However, by declaring your cursor as a different type, say a static cursor, it has very good performance.
See these articles for a fuller explanation:
The Truth About Cursors: Part I
The Truth About Cursors: Part II
The Truth About Cursors: Part III
I think the biggest con against cursors is performance, however, not laying out a task in a set based approach would probably rank second. Third would be readability and layout of the tasks as they usually don't have a lot of helpful comments.
SQL Server is optimized to run the set based approach. You write the query to return a result set of data, like a join on tables for example, but the SQL Server execution engine determines which join to use: Merge Join, Nested Loop Join, or Hash Join. SQL Server determines the best possible joining algorithm based upon the participating columns, data volume, indexing structure, and the set of values in the participating columns. So using a set based approach is generally the best approach in performance over the procedural cursor approach.
它们对于动态 SQL 旋转之类的事情是必要的,但您应该尽可能避免使用它们。
They are necessary for things like dynamic SQL pivoting, but you should try and avoid using them whenever possible.