Python 的 DB-API 中的游标如何工作?

发布于 2024-07-12 04:52:33 字数 735 浏览 13 评论 0原文

我一直在 RDBMS(MySQL 和 PostgreSQL)中使用 python,我注意到我真的不明白如何使用游标。

通常,一个人让他的脚本通过客户端 DB-API(如 psycopg2 或 MySQLdb)连接到数据库:

connection = psycopg2.connect(host='otherhost', etc)

然后创建一个游标:

cursor = connection.cursor()

然后可以发出查询和命令:

cursor.execute("SELECT * FROM etc")

现在查询的结果在哪里,我想知道? 是在服务器上吗? 或者在我的客户端上一点,在我的服务器上一点? 然后,如果我们需要访问一些结果,我们会获取它们:

rows = cursor.fetchone() 

或者

rows = cursor.fetchmany()

现在假设我不检索所有行,并决定执行另一个查询,以前的结果会发生什么? 是他们的开销吗?

另外,我是否应该为每种形式的命令创建一个光标,并以某种方式不断地将其重复用于相同的命令? 我头 psycopg2 可以以某种方式优化多次执行但具有不同值的命令,如何以及是否值得?

谢谢

I have been using python with RDBMS' (MySQL and PostgreSQL), and I have noticed that I really do not understand how to use a cursor.

Usually, one have his script connect to the DB via a client DB-API (like psycopg2 or MySQLdb):

connection = psycopg2.connect(host='otherhost', etc)

And then one creates a cursor:

cursor = connection.cursor()

And then one can issue queries and commands:

cursor.execute("SELECT * FROM etc")

Now where is the result of the query, I wonder? is it on the server? or a little on my client and a little on my server? And then, if we need to access some results, we fetch 'em:

rows = cursor.fetchone() 

or

rows = cursor.fetchmany()

Now lets say, I do not retrieve all the rows, and decide to execute another query, what will happen to the previous results? Is their an overhead.

Also, should I create a cursor for every form of command and continuously reuse it for those same commands somehow; I head psycopg2 can somehow optimize commands that are executed many times but with different values, how and is it worth it?

Thx

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

蓝眸 2024-07-19 04:52:33

是的,我知道它已经有几个月了 :P

DB-API 的游标似乎是严格模仿 SQL 游标的。 就 AFA 资源(行)管理而言,DB-API 并未指定客户端是否必须检索所有行或 DECLARE 实际的 SQL 游标。 只要 fetchXXX 接口执行其预期的操作,DB-API 就会满意。

就 AFA psycopg2 游标而言(您可能很清楚),“未命名的 DB-API 游标”将获取整个结果集——AFAIK 由 libpq 缓冲在内存中。 “命名的 DB-API 游标”(psycopg2 的概念,可能不可移植),将按需请求行(fetchXXX 方法)。

正如“未知”所引用的,executemany 可用于优化同一命令的多次运行。 然而,它不能满足准备好的语句的需要; 当具有不同参数集的语句的重复执行不是直接连续时,executemany() 的执行效果与execute() 一样好。 DB-API 确实“为”驱动程序作者“提供”缓存已执行语句的能力,但其实现(语句的范围/生命周期是什么?)未定义,因此不可能跨 DB-API 实现设置期望。

如果您要将大量数据加载到 PostgreSQL 中,我强烈建议尝试找到一种使用 COPY 的方法。

ya, i know it's months old :P

DB-API's cursor appears to be closely modeled after SQL cursors. AFA resource(rows) management is concerned, DB-API does not specify whether the client must retrieve all the rows or DECLARE an actual SQL cursor. As long as the fetchXXX interfaces do what they're supposed to, DB-API is happy.

AFA psycopg2 cursors are concerned(as you may well know), "unnamed DB-API cursors" will fetch the entire result set--AFAIK buffered in memory by libpq. "named DB-API cursors"(a psycopg2 concept that may not be portable), will request the rows on demand(fetchXXX methods).

As cited by "unbeknown", executemany can be used to optimize multiple runs of the same command. However, it doesn't accommodate for the need of prepared statements; when repeat executions of a statement with different parameter sets is not directly sequential, executemany() will perform just as well as execute(). DB-API does "provide" driver authors with the ability to cache executed statements, but its implementation(what's the scope/lifetime of the statement?) is undefined, so it's impossible to set expectations across DB-API implementations.

If you are loading lots of data into PostgreSQL, I would strongly recommend trying to find a way to use COPY.

很糊涂小朋友 2024-07-19 04:52:33

假设您使用 PostgreSQL,游标可能只是使用数据库的本机游标 API 来实现。 您可能需要查看纯 Python PostgreSQL DB-API 模块 pg8000 的源代码,以了解它如何处理光标。 您可能还想查看游标的 PostgreSQL 文档

Assuming you're using PostgreSQL, the cursors probably are just implemented using the database's native cursor API. You may want to look at the source code for pg8000, a pure Python PostgreSQL DB-API module, to see how it handles cursors. You might also like to look at the PostgreSQL documentation for cursors.

一杆小烟枪 2024-07-19 04:52:33

当您查看 mysqldb 文档 时,您可以看到它们实现了游标的不同策略。 所以一般的答案是:这取决于。

编辑:这是 mysqldb API 文档。 有一些关于每种游标类型的行为方式的信息。 标准游标将结果集存储在客户端中。 因此,我假设如果您不检索所有结果行,则会产生开销,因为即使您未获取的行也必须传输到客户端(可能通过网络)。 我的猜测是它与 postgresql 没有什么不同。

当您想要优化使用多个值重复调用的 SQL 语句时,您应该查看cursor.executemany()。 它准备一个 SQL 语句,这样就不需要每次调用它时都对其进行解析:

cur.executemany('INSERT INTO mytable (col1, col2) VALUES (%s, %s)',
                [('val1', 1), ('val2', 2)])

When you look here at the mysqldb documentation you can see that they implemented different strategies for cursors. So the general answer is: it depends.

Edit: Here is the mysqldb API documentation. There is some info how each cursor type is behaving. The standard cursor is storing the result set in the client. So I assume there is a overhead if you don't retrieve all result rows, because even the rows you don't fetch have to be transfered to the client (potentially over the network). My guess is that it is not that different from postgresql.

When you want to optimize SQL statements that you call repeatedly with many values, you should look at cursor.executemany(). It prepares a SQL statement so that it doesn't need to be parsed every time you call it:

cur.executemany('INSERT INTO mytable (col1, col2) VALUES (%s, %s)',
                [('val1', 1), ('val2', 2)])
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文