当前位置：文江博客话题详情

使用数据库游标有什么好处？

发布于 2024-09-25 23:35:36 字数 303 浏览 5 评论 0原文

这是基于我面临的面试问题。

非常短的定义可以是

它可用于操作行由查询返回。

除了使用光标之外（MSDN 上此处列出了要点），我心中有一个问题，如果我们可以使用查询或存储过程执行所有操作（如果我没记错的话，就像我们可以使用 Transact-SQL for ms-sql 一样），我们是否应该使用任何具体点光标？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

演出会有结束 2024-10-02 23:35:36

与大型结果集相比，使用光标就像使用视频流而不是一次性下载视频，然后在下载后观看它。
如果你下载，你必须有几GB的空间和耐心等待下载完成。现在，无论您的机器或网络有多快，每个人都以相同的速度观看电影。

通常，任何查询都会发送到服务器并执行，然后通过网络将结果集发送给您，这是一次突发的活动。
游标将使您可以逐行访问数据，并且仅在您请求时（实际上可以查看它）流式传输每一行。

游标可以节省您的时间 - 因为您不需要等待完整记录集的处理和下载
它会节省您在服务器和客户端上的内存，因为它们不必专门占用大量内存内存到结果
集负载平衡网络和服务器 - 在“突发”模式下工作通常效率更高，但它可能会完全阻塞您的服务器和网络。对于多用户环境来说，这样的延迟是很少需要的。流式处理为其他操作留下了空间。
允许对查询表（在某些条件下）进行不直接影响游标的操作。因此，当您将光标放在一行上时，其他进程可以读取、更新甚至删除其他行。这对于非常繁忙的表、许多并发读取和写入尤其有帮助。

然而，这给我们带来了一些警告：

一致性：使用游标，您（通常）不会对数据的一致快照进行操作，而是对一行进行操作。因此，您的并发性/一致性/隔离性保证从整个数据库（ACID）下降到只有一行。您通常可以告知 DBMS 您想要什么级别的并发性，但如果您太挑剔（锁定您所在的整个表），您将浪费服务器端节省的许多资源。
单独传输每一行可能效率非常低，因为每个数据包都有协商开销，您可以通过为每个数据包发送大的（可能是压缩的）数据块来避免协商开销。（没有数据库服务器或客户端库愚蠢到单独传输每一行，两端都有缓存和分块，但仍然相关。）
游标更难正确执行。考虑一个具有大结果集的查询，促使您使用游标，该游标使用带有聚合函数的 GROUP BY 子句。（此类查询在数据仓库中很常见）。 GROUP BY 可以完全破坏您的服务器，因为它必须立即生成并存储整个结果集，甚至可能持有其他表上的锁。

经验法则：

如果您处理快速创建的小型结果集，请不要使用游标。
游标擅长处理临时、复杂（参考）、具有大结果集和低一致性要求的顺序查询。

“顺序性质”意味着查询中的大量 GROUP BY 子句中没有聚合函数。服务器可以懒惰地决定计算 10 行供游标从缓存中使用，并同时执行其他操作。

华泰

Using cursors compared to big resultsets is like using video streaming instead of downloading an video in one swoop, and watching it when it has downloaded.
If you download, you have to have a few gigs of space and the patience to wait until the download finished. Now, no matter how fast your machine or network may be, everyone watches a movie at the same speed.

Normally any query gets sent to the server, executed, and the resultset sent over the network to you, in one burst of activity.
The cursor will give you access to the data row by row and stream every row only when you request it (can actually view it).

A cursor can save you time - because you don't need to wait for the processing and download of your complete recordset
It will save you memory, both on the server and on the client because they don't have to dedicate a big chunk of memory to resultsets
Load-balance both your network and your server - Working in "burst" mode is usually more efficient, but it can completely block your server and your network. Such delays are seldom desirable for multiuser environments. Streaming leaves room for other operations.
Allows operations on queried tables (under certain conditions) that do not affect your cursor directly. So while you are holding a cursor on a row, other processes are able to read, update and even delete other rows. This helps especially with very busy tables, many concurrent reads and writes.

Which brings us to some caveats, however:

Consistency: Using a cursor, you do (usually) not operate on a consistent snapshot of the data, but on a row. So your concurrency/consistency/isolation guarantees drop from the whole database (ACID) to only one row. You can usually inform your DBMS what level of concurrency you want, but if you are too nitpicky (locking the complete table you are in), you will throw away many of the resource savings on the server side.
Transmitting every row by itself can be very inefficient, since every packet has negotiation overhead that you might avoid by sending big, maybe compressed, chunks of data per packet. ( No DB server or client library is stupid enough to transmit every row individually, there's caching and chunking on both ends, still, it is relevant.)
Cursors are harder to do right. Consider a query with a big resultset, motivating you to use a cursor, that uses a GROUP BY clause with aggregate functions. (Such queries are common in data warehouses). The GROUP BY can completely trash your server, because it has to generate and store the whole resultset at once, maybe even holding locks on other tables.

Rule of thumb:

If you work on small, quickly created resultsets, don't use cursors.
Cursors excell on ad hoc, complex (referentially), queries of sequential nature with big resultsets and low consistency requirements.

"Sequential nature" means there are no aggregate functions in heavy GROUP BY clauses in your query. The server can lazily decide to compute 10 rows for your cursor to consume from a cache and do other stuff meanwhile.

HTH

回复收藏 0 原文

从﹋此江山别 2024-10-02 23:35:36

游标是一种允许您迭代集合中的记录的工具。它具有顺序和当前记录的概念。

一般来说，SQL 使用多重集进行操作：这些是一组可能重复的记录，没有给定的顺序，作为一个整体。

比如说，这个查询：

SELECT  *
FROM    a
JOIN    b
ON      b.a = a.id

，对多重集 a 和 b 进行操作。

此查询中没有对记录的顺序、记录的存储方式、访问顺序等做出任何假设。

这允许抽象出实现细节，并让系统尝试选择最佳的算法来运行此查询询问。

但是，在转换完所有数据后，最终您需要以有序的方式逐条访问记录。

您并不关心电话簿的条目如何准确地存储在硬盘驱动器上，但打印机确实要求它们按字母顺序输入；并且格式标签应单独应用于每个记录。

这正是光标发挥作用的地方。每次在客户端处理结果集时，您都在使用游标。您不会从服务器获得兆字节的未排序数据：您只会获得一个小变量：一个结果集描述符，然后编写如下内容：

while (!rs.EOF) {
   process(rs);
   rs.moveNext();
}

这就是为您实现所有这些的游标。

这当然涉及数据库与客户端的交互。

至于数据库本身：在数据库内部，您很少需要游标，因为正如我上面所说，几乎所有数据转换都可以使用集合操作更有效地实现。

但是，也有例外：

SQL Server 中的分析操作实现得很差。例如，使用游标可以比使用基于集合的操作
按块处理数据更有效地计算累积和。在某些情况下，基于集合的操作应顺序应用于集合的部分，并且每个块的结果应独立提交。虽然仍然可以使用基于集合的操作来完成此操作，但游标通常是更优选的方法。
本身不支持递归的系统中的递归。

您可能还会发现这篇文章值得一读：

岛屿不合适的光标

A cursor is a tool that allows you to iterate the records in a set. It has concepts of order and current record.

Generally, SQL operates with multisets: these are sets of possibly repeating records in no given order, taken as a whole.

Say, this query:

SELECT  *
FROM    a
JOIN    b
ON      b.a = a.id

, operates on multisets a and b.

Nothing in this query makes any assumptions about the order of the records, how they are stored, in which order they should be accessed, etc.

This allows to abstract away implementation details and let the system try to choose the best possible algorithm to run this query.

However, after you have transformed all your data, ultimately you will need to access the records in an ordered way and one by one.

You don't care about how exactly the entries of a phonebook are stored on a hard drive, but a printer does require them to be feed in alphabetical order; and the formatting tags should be applied to each record individually.

That's exactly where the cursors come into play. Each time you are processing a resultset on the client side, you are using a cursor. You don't get megabytes of unsorted data from the server: you just get a tiny variable: a resultset descriptor, and just write something like this:

while (!rs.EOF) {
   process(rs);
   rs.moveNext();
}

That's cursor that implements all this for you.

This of course concerns database-client interaction.

As for the database itself: inside the database, you rarely need the cursors, since, as I have told above, almost all data transformations can be implemented using set operations more efficiently.

However, there are exceptions:

Analytic operations in SQL Server are implemented very poorly. A cumulative sum, for instance, could be calculated much more efficiently with a cursor than using the set-based operations
Processing data in chunks. There are cases when a set based operation should be sequentially applied to a portion of a set and the results of each chunk should be committed independently. While it's still possible to do it using set-based operations, a cursor is often a more preferred way to do this.
Recursion in the systems that do not support it natively.

You also may find this article worth reading:

The Island of Misfit Cursors

回复收藏 0 原文

孤星 2024-10-02 23:35:36

使用游标可以通过编程顺序读取一组数据，因此它的行为方式与传统文件访问类似，而不是 SQL 的基于集的行为特征。

在以下几种情况下可能会用到此功能：

需要模拟基于文件的记录访问行为时 - 例如，使用关系数据库作为一段代码的数据存储机制之前编写的代码是使用索引文件进行数据存储。
需要按顺序处理数据时 - 一个简单的示例可能是计算特定客户的运行总余额。（许多关系数据库，例如 Oracle 和 SQLServer，现在都具有 SQL 的分析扩展，这应该会大大减少对此的需求。）

不可避免的是，维基百科有更多：http://en.wikipedia.org/wiki/Database_cursor

回复收藏 0 原文