psycopg2:用一个查询插入多行
我需要用一个查询插入多行(行数不是恒定的),所以我需要执行这样的查询:
INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);
我知道的唯一方法是,
args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)
但我想要一些更简单的方法。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(21)
cursor.copy_from 是我见过的最快的解决方案到目前为止发现用于批量插入。 这是我制作的要点,其中包含一个名为 IteratorFile 的类,它允许生成字符串的迭代器像读取文件。我们可以使用生成器表达式将每个输入记录转换为字符串。因此,解决方案是
对于这个微不足道的参数大小,它不会产生太大的速度差异,但我在处理数千行以上时看到了很大的加速。与构建巨大的查询字符串相比,它的内存效率也更高。迭代器一次只能在内存中保存一个输入记录,在某些时候,您将在 Python 进程或 Postgres 中通过构建查询字符串耗尽内存。
cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be
For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.
Psycopg2 教程页面的片段 Postgresql.org(见底部):
它不会节省太多代码,但看起来确实更好。
A snippet from Psycopg2's tutorial page at Postgresql.org (see bottom):
It doesn't save much code, but it definitively looks better.
安全漏洞
截至 2022 年 11 月 16 日,@Clodoaldo Neto(针对 Psycopg 2.6)、@Joseph Sheedy、@JJ、@Bart Jonk、@kevo Njoki、@TKoutny 和 @Nihal Sharma 的答案包含 SQL注入漏洞并且不应该 用过的。
迄今为止最快的提案 (
copy_from
) 也不应该使用,因为很难正确转义数据。当尝试插入'
、"
、\n
、\
、等字符时,这一点很明显 。
\t 或
\n
作者 的psycopg2 还建议不要使用copy_from
:最快的方法
最快的方法是
cursor.copy_expert
,它可以直接从 CSV 文件插入数据。copy_expert
也是即时生成 CSV 文件的最快方法。作为参考,请参阅以下CSVFile
类,该类会注意限制内存使用。然后可以像这样使用该类:
如果您的所有数据都适合内存,您也可以直接生成整个 CSV 数据,而无需
CSVFile
类,但如果您不知道要生成多少数据将来插入,您可能不应该这样做。基准测试结果
cursor.execute
cursor.executemany
psycopg2.extras.execute_batch
page_size=1000
execute_batch
,使用准备好的语句psycopg2.extras.execute_values
cursor.execute
,使用字符串连接值copy_expert
一次性生成整个 CSV 文件copy_expert
与CSVFile
Security vulnerabilities
As of 2022-11-16, the answers by @Clodoaldo Neto (for Psycopg 2.6), @Joseph Sheedy, @J.J, @Bart Jonk, @kevo Njoki, @TKoutny and @Nihal Sharma contain SQL injection vulnerabilities and should not be used.
The fastest proposal so far (
copy_from
) should not be used either because it is difficult to escape the data correctly. This is easily apparent when trying to insert characters like'
,"
,\n
,\
,\t
or\n
.The author of psycopg2 also recommends against
copy_from
:The fastest method
The fastest method is
cursor.copy_expert
, which can insert data straight from CSV files.copy_expert
is also the fastest method when generating the CSV file on-the-fly. For reference, see the followingCSVFile
class, which takes care to limit memory usage.This class can then be used like:
If all your data fits into memory, you can also generate the entire CSV data directly without the
CSVFile
class, but if you do not know how much data you are going to insert in the future, you probably should not do that.Benchmark results
cursor.execute
cursor.executemany
psycopg2.extras.execute_batch
execute_batch
withpage_size=1000
execute_batch
with prepared statementpsycopg2.extras.execute_values
cursor.execute
with string-concatenated valuescopy_expert
generating the entire CSV file at oncecopy_expert
withCSVFile
所有这些技术在 Postgres 术语中都称为“扩展插入”,截至 2016 年 11 月 24 日,它仍然比 Psychopg2 的executemany() 和该线程中列出的所有其他方法快得多(我在讨论此问题之前尝试过)答案)。
这里有一些不使用 cur.mogrify 的代码,很不错,只是为了让您了解一下:
但应该注意的是,如果您可以使用 copy_from(),你应该使用 copy_from ;)
All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).
Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:
But it should be noted that if you can use copy_from(), you should use copy_from ;)
10 年后,Psycopg 3 和 PostgreSQL 14 或更高版本的答案是:使用 管道模式。在Psycopg 3的管道模式实现中的execute/executemany语句中使用普通INSERT快速、安全地对抗 SQL 注入。从 Psycopg 3.1 开始,executemany() 在内部使用管道模式2。
启动管道模式和executemany() 的示例。由于 Psycopg 3.1 conn.pipeline() 仅对于execute() 是必需的,但如果只调用一次,则不再对于executemany() 是必需的。
10 years later, an answer for Psycopg 3 and PostgreSQL 14 or newer is: use pipeline mode. Use ordinary INSERTs in execute/executemany statements in Psycopg 3's pipeline mode implementation to be fast and safe against SQL injection. Starting with Psycopg 3.1, executemany() makes use internally of the pipeline mode2.
Example with starting pipeline mode and executemany(). Since Psycopg 3.1 conn.pipeline() is only necessary for execute(), but no longer for executemany(), if you call it only once.
我已经使用上面 ant32 的答案好几年了。但是我发现 python 3 中存在错误,因为
mogrify
返回一个字节字符串。显式转换为字节字符串是使代码与 python 3 兼容的简单解决方案。
I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because
mogrify
returns a byte string.Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.
executemany 接受元组数组
https://www.postgresqltutorial.com /postgresql-python/插入/
executemany accept array of tuples
https://www.postgresqltutorial.com/postgresql-python/insert/
由 cursor.copyfrom 解决方案://stackoverflow.com/users/958118/joseph-sheedy">@jopseph.sheedy (https://stackoverflow.com/users/958118/joseph-sheedy)(https://stackoverflow.com/a/30721460/11100064)确实快如闪电。
但是,他给出的示例通常不适用于具有任意数量字段的记录,我花了一些时间才弄清楚如何正确使用它。
IteratorFile 需要使用制表符分隔的字段进行实例化,如下所示(
r
是一个字典列表,其中每个字典都是一条记录):为了概括任意数量的字段,我们将首先创建一个行字符串使用正确数量的制表符和字段占位符:
"{}\t{}\t{}....\t{}"
,然后使用.format()
填写字段值:*list(r.values())) for r in reports
:要点在这里。
The cursor.copyfrom solution as provided by @jopseph.sheedy (https://stackoverflow.com/users/958118/joseph-sheedy) above (https://stackoverflow.com/a/30721460/11100064) is indeed lightning fast.
However, the example he gives are not generically usable for a record with any number of fields and it took me while to figure out how to use it correctly.
The IteratorFile needs to be instantiated with tab-separated fields like this (
r
is a list of dicts where each dict is a record):To generalise for an arbitrary number of fields we will first create a line string with the correct amount of tabs and field placeholders :
"{}\t{}\t{}....\t{}"
and then use.format()
to fill in the field values :*list(r.values())) for r in records
:complete function in gist here.
自此问题发布以来, execute_batch 已添加到 psycopg2 中。
它比 execute_values 更快。
execute_batch has been added to psycopg2 since this question was posted.
It is faster than execute_values.
另一种不错且有效的方法 - 是将要插入的行作为 1 个参数传递,
这是 json 对象的数组。
例如,您传递参数:
它是数组,其中可以包含任意数量的对象。
那么你的 SQL 看起来像:
注意:你的 postgress 必须足够新,才能支持 json
Another nice and efficient approach - is to pass rows for insertion as 1 argument,
which is array of json objects.
E.g. you passing argument:
It is array, which may contain any amount of objects inside.
Then your SQL looks like:
Notice: Your postgress must be new enough, to support json
如果您使用 SQLAlchemy,则无需手动编写字符串,因为 SQLAlchemy 支持为单个值生成多行
VALUES
子句INSERT
语句:If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row
VALUES
clause for a singleINSERT
statement:psycopg2 2.9.3
或
psycopg2 2.9.3
or
Psycopg 3 提供了一种使用 PostgresCOPY 的简单方法,而且效率很高。
其他信息可在官方文档中找到。
Psycopg 3 provides a simple way for using Postgres COPY, which is highly efficient.
Additional information is available in the official documentation.
如果你想在一个插入语句中插入多行(假设你没有使用 ORM),到目前为止对我来说最简单的方法是使用字典列表。这是一个示例:
正如您所看到的,只会执行一个查询:
If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:
As you can see only one query will be executed:
来自@ant32
From @ant32
我使用的解决方案可以在 1 毫秒内插入大约 8000 条记录
The Solution am using can insert like 8000 records in 1 millisecond
最后在 SQLalchemy1.2 版本中,添加了这个新的实现,以便在使用 use_batch_mode=True 初始化引擎时使用 psycopg2.extras.execute_batch() 而不是 executemany,例如:
http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109
然后有人必须使用 SQLalchmey 不会费心去尝试 sqla 和 psycopg2 的不同组合直接SQL在一起..
Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:
http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109
Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..
使用 aiopg - 下面的代码片段工作得很好
Using aiopg - The snippet below works perfectly fine
我构建了一个程序,可以将多行插入到位于另一个城市的服务器中。
我发现使用此方法比
executemany
快大约 10 倍。在我的例子中,tup
是一个包含大约 2000 行的元组。使用该方法时大约需要10秒:使用该方法时大约需要2分钟:
I built a program that inserts multiple lines to a server that was located in another city.
I found out that using this method was about 10 times faster than
executemany
. In my casetup
is a tuple containing about 2000 rows. It took about 10 seconds when using this method:and 2 minutes when using this method:
Psycopg 2.7 中的新
execute_values
方法 :在 Psycopg 2.6 中执行此操作的 pythonic 方式:
说明:如果要插入的数据以元组列表的形式给出
,那么它已经是所需的确切格式,如下所示
insert
子句的values
语法需要一个记录列表,如下所示插入到 t(a, b) 值 (1, 'x'),(2, 'y')
Psycopg
将 Pythontuple
改编为 Postgresql记录
。唯一必要的工作是提供一个由 psycopg 填充的记录列表模板
,并将其放入
insert
查询中现在打印
insert_query
输出到通常的
Psycopg< /code> 参数替换
或者只是测试将发送到服务器的内容
输出:
New
execute_values
method in Psycopg 2.7:The pythonic way of doing it in Psycopg 2.6:
Explanation: If the data to be inserted is given as a list of tuples like in
then it is already in the exact required format as
the
values
syntax of theinsert
clause expects a list of records as ininsert into t (a, b) values (1, 'x'),(2, 'y')
Psycopg
adapts a Pythontuple
to a Postgresqlrecord
.The only necessary work is to provide a records list template to be filled by psycopg
and place it in the
insert
queryPrinting the
insert_query
outputsNow to the usual
Psycopg
arguments substitutionOr just testing what will be sent to the server
Output:
使用 psycopg2 2.7 更新:
经典的
executemany()
比 @ant32 的实现(称为“folded”)慢大约 60 倍,如本线程中所述:https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com此实现在 2.7 版本中添加到 psycopg2 中,称为
execute_values( )
:上一个答案:
要插入多行,请使用 multirow使用
execute()
的VALUES
语法比使用 psycopg2executemany()
快约 10 倍。事实上,executemany()
只是运行许多单独的INSERT
语句。@ant32 的代码在 Python 2 中完美运行。但在 Python 3 中,cursor.mogrify() 返回字节,cursor.execute() 接受字节或字符串,并且
','.join()
需要str
实例。因此,在 Python 3 中,您可能需要通过添加
.decode('utf-8')
来修改 @ant32 的代码:或者使用字节(使用
b''
或仅限b""
):Update with psycopg2 2.7:
The classic
executemany()
is about 60 times slower than @ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.comThis implementation was added to psycopg2 in version 2.7 and is called
execute_values()
:Previous Answer:
To insert multiple rows, using the multirow
VALUES
syntax withexecute()
is about 10x faster than using psycopg2executemany()
. Indeed,executemany()
just runs many individualINSERT
statements.@ant32 's code works perfectly in Python 2. But in Python 3,
cursor.mogrify()
returns bytes,cursor.execute()
takes either bytes or strings, and','.join()
expectsstr
instance.So in Python 3 you may need to modify @ant32 's code, by adding
.decode('utf-8')
:Or by using bytes (with
b''
orb""
) only: