为什么在 SQL Server 中使用游标被认为是不好的做法?
我早在 SQL 7 时代就知道一些性能原因,但是 SQL Server 2005 中是否仍然存在同样的问题? 如果我想对存储过程中有一个结果集单独进行操作,那么游标仍然是一个糟糕的选择吗? 如果是这样,为什么?
I knew of some performance reasons back in the SQL 7 days, but do the same issues still exist in SQL Server 2005? If I have a resultset in a stored procedure that I want to act upon individually, are cursors still a bad choice? If so, why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
游标确实有一席之地,但我认为这主要是因为它们经常在单个 select 语句足以提供结果聚合和过滤时使用。
避免游标可以让 SQL Server 更全面地优化查询性能,这在大型系统中非常重要。
Cursors do have their place, however I think it's mainly because they are often used when a single select statement would suffice to provide aggregation and filtering of results.
Avoiding cursors allows SQL Server to more fully optimize the performance of the query, very important in larger systems.
我认为,基本问题是数据库是为基于集合的操作而设计和调整的——根据数据中的关系在一个快速步骤中选择、更新和删除大量数据。
另一方面,内存软件是为单独的操作而设计的,因此循环一组数据并可能对每个项目串行执行不同的操作是它最擅长的。
循环不是数据库或存储架构的设计目的,即使在 SQL Server 2005 中,如果将基本数据集拉入自定义程序并在内存中执行循环,您也无法获得与您所获得的性能相近的性能。 ,使用尽可能轻量的数据对象/结构。
The basic issue, I think, is that databases are designed and tuned for set-based operations -- selects, updates, and deletes of large amounts of data in a single quick step based on relations in the data.
In-memory software, on the other hand, is designed for individual operations, so looping over a set of data and potentially performing different operations on each item serially is what it is best at.
Looping is not what the database or storage architecture are designed for, and even in SQL Server 2005, you are not going to get performance anywhere close to you get if you pull the basic data set out into a custom program and do the looping in memory, using data objects/structures that are as lightweight as possible.
在非常非常少的情况下,使用游标是合理的。 几乎在任何情况下它都会优于基于集合的关系查询。 有时,程序员用循环来思考会更容易,但是使用集合逻辑(例如更新表中的大量行)将导致解决方案不仅减少了 SQL 代码行,但运行速度要快得多,通常快几个数量级。
即使是 Sql Server 2005 中的快进游标也无法与基于集合的查询竞争。 与基于集合的操作相比,性能下降的图表通常开始看起来像 n^2 操作,随着数据集变得非常大,它往往更加线性。
There are very, very few cases where the use of a cursor is justified. There are almost no cases where it will outperform a relational, set-based query. Sometimes it is easier for a programmer to think in terms of loops, but the use of set logic, for example to update a large number of rows in a table, will result in a solution that is not only many less lines of SQL code, but that runs much faster, often several orders of magnitude faster.
Even the fast forward cursor in Sql Server 2005 can't compete with set-based queries. The graph of performance degradation often starts to look like an n^2 operation compared to set-based, which tends to be more linear as the data set grows very large.
@丹尼尔P-> 您不需要使用光标来完成此操作。 您可以轻松地使用基于集合的理论来做到这一点。 例如:使用 Sql 2008
将简单地执行您上面所说的操作。 您可以对 Sql 2000 执行相同的操作,但查询的语法会有所不同。
不过,我的建议是尽可能避免使用游标。
迦耶姆
@ Daniel P -> you don't need to use a cursor to do it. You can easily use set based theory to do it. Eg: with Sql 2008
will simply do what you have said above. And you can do the same with Sql 2000 but the syntax of query would be different.
However, my advice is to avoid cursors as much as possible.
Gayam
有时,您需要执行的处理的性质需要游标,但出于性能原因,如果可能的话,使用基于集合的逻辑编写操作总是更好。
我不会称使用游标为“不好的做法”,但它们确实消耗了服务器上更多的资源(比等效的基于集的方法),而且通常它们是不必要的。 鉴于此,我的建议是在诉诸游标之前考虑其他选项。
有多种类型的游标(仅向前、静态、键集、动态)。 每一种都有不同的性能特征和相关的开销。 确保您的操作使用正确的光标类型。 仅转发是默认设置。
使用游标的一个理由是当您需要处理和更新单个行时,尤其是对于没有良好唯一键的数据集。 在这种情况下,您可以在声明游标时使用 FOR UPDATE 子句,并使用 UPDATE ... WHERE CURRENT OF 处理更新。
请注意,“服务器端”游标曾经很流行(来自 ODBC 和 OLE DB),但 ADO.NET 不支持它们,并且 AFAIK 永远不会支持它们。
Sometimes the nature of the processing you need to perform requires cursors, though for performance reasons it's always better to write the operation(s) using set-based logic if possible.
I wouldn't call it "bad practice" to use cursors, but they do consume more resources on the server (than an equivalent set-based approach) and more often than not they aren't necessary. Given that, my advice would be to consider other options before resorting to a cursor.
There are several types of cursors (forward-only, static, keyset, dynamic). Each one has different performance characteristics and associated overhead. Make sure you use the correct cursor type for your operation. Forward-only is the default.
One argument for using a cursor is when you need to process and update individual rows, especially for a dataset that doesn't have a good unique key. In that case you can use the FOR UPDATE clause when declaring the cursor and process updates with UPDATE ... WHERE CURRENT OF.
Note that "server-side" cursors used to be popular (from ODBC and OLE DB), but ADO.NET does not support them, and AFAIK never will.
SQL 是一种基于集合的语言——这正是它最擅长的地方。
我认为游标仍然是一个糟糕的选择,除非您对它们有足够的了解以证明它们在有限的情况下使用是合理的。
我不喜欢光标的另一个原因是清晰度。 光标块太丑了,很难以清晰有效的方式使用。
综上所述,在某些情况下光标确实是最好的——只是初学者通常不想使用它们。
SQL is a set based language--that's what it does best.
I think cursors are still a bad choice unless you understand enough about them to justify their use in limited circumstances.
Another reason I don't like cursors is clarity. The cursor block is so ugly that it's difficult to use in a clear and effective way.
All that having been said, there are some cases where a cursor really is best--they just aren't usually the cases that beginners want to use them for.
因为游标会占用内存并创建锁。
您真正所做的是尝试将基于集合的技术强制引入非基于集合的功能。 而且,平心而论,我应该指出游标确实有用途,但人们不赞成它们,因为许多不习惯使用基于集合的解决方案的人使用游标而不是找出集合基于解决方案。
但是,当您打开游标时,您基本上是将这些行加载到内存中并锁定它们,从而创建潜在的块。 然后,当您循环游标时,您将对其他表进行更改,并且仍然保持游标的所有内存和锁打开。
所有这些都有可能给其他用户带来性能问题。
因此,作为一般规则,游标是不受欢迎的。 特别是如果这是解决问题时达成的第一个解决方案。
Because cursors take up memory and create locks.
What you are really doing is attempting to force set-based technology into non-set based functionality. And, in all fairness, I should point out that cursors do have a use, but they are frowned upon because many folks who are not used to using set-based solutions use cursors instead of figuring out the set-based solution.
But, when you open a cursor, you are basically loading those rows into memory and locking them, creating potential blocks. Then, as you cycle through the cursor, you are making changes to other tables and still keeping all of the memory and locks of the cursor open.
All of which has the potential to cause performance issues for other users.
So, as a general rule, cursors are frowned upon. Especially if that's the first solution arrived at in solving a problem.
上面关于 SQL 是基于集合的环境的评论都是正确的。 然而,有时逐行操作很有用。 考虑元数据和动态 SQL 的组合。
举一个非常简单的例子,假设我在一个表中有 100 多条记录,这些记录定义了我想要复制/截断/任何内容的表的名称。 哪个最好? 对 SQL 进行硬编码以执行我需要的操作? 或者迭代此结果集并使用动态 SQL (sp_executesql) 来执行操作?
使用基于集合的 SQL 无法实现上述目标。
那么,使用游标还是 while 循环(伪游标)?
只要您使用正确的选项,SQL 游标就可以:
INSENSITIVE 将为您的结果集创建一个临时副本(使您不必自己为伪游标执行此操作)。
READ_ONLY 将确保基础结果集上不持有任何锁。 基础结果集的更改将反映在后续的提取中(与从伪光标获取 TOP 1 相同)。
FAST_FORWARD 将创建一个优化的只进、只读游标。
在将所有光标判定为邪恶之前,请先阅读可用选项。
The above comments about SQL being a set-based environment are all true. However there are times when row-by-row operations are useful. Consider a combination of metadata and dynamic-sql.
As a very simple example, say I have 100+ records in a table that define the names of tables that I want to copy/truncate/whatever. Which is best? Hardcoding the SQL to do what I need to? Or iterate through this resultset and use dynamic-SQL (sp_executesql) to perform the operations?
There is no way to achieve the above objective using set-based SQL.
So, to use cursors or a while loop (pseudo-cursors)?
SQL Cursors are fine as long as you use the correct options:
INSENSITIVE will make a temporary copy of your result set (saving you from having to do this yourself for your pseudo-cursor).
READ_ONLY will make sure no locks are held on the underlying result set. Changes in the underlying result set will be reflected in subsequent fetches (same as if getting TOP 1 from your pseudo-cursor).
FAST_FORWARD will create an optimised forward-only, read-only cursor.
Read about the available options before ruling all cursors as evil.
有一种关于游标的解决方法,我每次需要时都会使用它。
我创建一个表变量,其中包含标识列。
将我需要使用的所有数据插入其中。
然后使用计数器变量创建一个 while 块,并使用 select 语句从表变量中选择我想要的数据,其中标识列与计数器匹配。
这样我就不会锁定任何东西,并且使用更少的内存并且它是安全的,我不会因为内存损坏或类似的事情而丢失任何东西。
并且块代码很容易查看和处理。
这是一个简单的例子:
There is a work around about cursors that I use every time I need one.
I create a table variable with an identity column in it.
insert all the data i need to work with in it.
Then make a while block with a counter variable and select the data I want from the table variable with a select statement where the identity column matches the counter.
This way i dont lock anything and use alot less memory and its safe, i will not lose anything with a memory corruption or something like that.
And the block code is easy to see and handle.
This is a simple example:
我认为游标的名声不好,因为 SQL 新手发现它们后会想“嘿,for 循环!我知道如何使用它们!” 然后他们继续将它们用于所有事情。
如果你按照它们的设计目的来使用它们,我对此无可挑剔。
I think cursors get a bad name because SQL newbies discover them and think "Hey a for loop! I know how to use those!" and then they continue to use them for everything.
If you use them for what they're designed for, I can't find fault with that.
游标通常不是疾病,而是它的症状:不使用基于集合的方法(如其他答案中提到的)。
不理解这个问题,并且简单地相信避免“邪恶”光标就能解决它,可能会让事情变得更糟。
例如,用其他迭代代码替换游标迭代,例如将数据移动到临时表或表变量,以如下方式循环行:
或
这种方法,如另一个答案的代码所示,会使事情变得更糟并且没有解决原来的问题。 这是一种反模式,称为货物邪教编程:不知道为什么某些东西不好,因此实施更糟糕的东西来避免它! 我最近将此类代码(使用 #temptable 并且在身份/PK 上没有索引)更改回游标,并且更新略多于 10000 行只花费了 1 秒而不是近 3 分钟。 仍然缺乏基于集合的方法(两害相权取其轻),但那一刻我能做到的最好的了。
这种缺乏理解的另一个症状可能是我有时所说的“单一对象疾病”:通过数据访问层或对象关系映射器处理单个对象的数据库应用程序。 通常的代码如下:
而不是
第一个通常会用大量的 SELECT 淹没数据库,每个 SELECT 一次往返,特别是当对象树/图发挥作用并且臭名昭著的 SELECT N+1 问题出现时。
这是不理解关系数据库和基于集合的方法的应用程序方面,就像使用过程数据库代码(如 T-SQL 或 PL/SQL)时游标的方式一样!
Cursors are usually not the disease, but a symptom of it: not using the set-based approach (as mentioned in the other answers).
Not understanding this problem, and simply believing that avoiding the "evil" cursor will solve it, can make things worse.
For example, replacing cursor iteration by other iterative code, such as moving data to temporary tables or table variables, to loop over the rows in a way like:
or
Such an approach, as shown in the code of another answer, makes things much worse and doesn't fix the original problem. It's an anti-pattern called cargo cult programming: not knowing WHY something is bad and thus implementing something worse to avoid it! I recently changed such code (using a #temptable and no index on identity/PK) back to a cursor, and updating slightly more than 10000 rows took only 1 second instead of almost 3 minutes. Still lacking set-based approach (being the lesser evil), but the best I could do that moment.
Another symptom of this lack of understanding can be what I sometimes call "one object disease": database applications which handle single objects through data access layers or object-relational mappers. Typically code like:
instead of
The first will usually flood the database with tons of SELECTs, one round trip for each, especially when object trees/graphs come into play and the infamous SELECT N+1 problem strikes.
This is the application side of not understanding relational databases and set based approach, just the same way cursors are when using procedural database code, like T-SQL or PL/SQL!