当前位置：文江博客话题详情

Psycopg2、Postgresql、Python：批量插入的最快方法

发布于 2024-08-22 02:27:00 字数 584 浏览 7 评论 0原文

我正在寻找将数百万个元组批量插入数据库的最有效方法。我正在使用 Python、PostgreSQL 和 psycopg2。

我创建了一个长长的郁金香列表，应将其插入数据库，有时使用几何Simplify等修饰符。

最简单的方法是对 INSERT 语句列表进行字符串格式化，但我还读过其他三种方法：

使用 pyformat 绑定样式用于参数插入
使用 executemany元组列表，以及
使用将结果写入文件并使用COPY。

似乎第一种方法是最有效的，但我很感激您的见解和代码片段告诉我如何正确地做。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟雨凡馨 2024-08-29 02:27:00

是的，我会投票支持 COPY，前提是您可以将文件写入服务器的硬盘驱动器（不是运行应用程序的驱动器），因为 COPY 只会从服务器读取。

回复收藏 0 原文

合久必婚 2024-08-29 02:27:00

有一个新的 psycopg2 手册，其中包含所有选项的示例。

COPY 选项是最有效的。然后执行many。然后用pyformat执行。

回复收藏 0 原文

墨小沫ゞ 2024-08-29 02:27:00

根据我的经验 executemany 并不比自己运行许多插入更快，
最快的方法是自己格式化一个具有多个值的 INSERT ，也许将来 executemany 会改进，但目前

我子类化 list< /code> 并重载追加方法，因此当列表达到一定大小时，我格式化 INSERT 来运行它

回复收藏 0 原文

梦年海沫深 2024-08-29 02:27:00

插入许多项目的最新方法是使用 execute_values 帮助器 (https://www.psycopg.org/docs/extras.html#fast-execution-helpers）。

from psycopg2.extras import execute_values

insert_sql = "INSERT INTO table (id, name, created) VALUES %s"
# this is optional
value_template="(%s, %s, to_timestamp(%s))"

cur = conn.cursor()

items = []
items.append((1, "name", 123123))
# append more...

execute_values(cur, insert_sql, items, value_template)
conn.commit()

The newest way of inserting many items is using the execute_values helper (https://www.psycopg.org/docs/extras.html#fast-execution-helpers).

from psycopg2.extras import execute_values

insert_sql = "INSERT INTO table (id, name, created) VALUES %s"
# this is optional
value_template="(%s, %s, to_timestamp(%s))"

cur = conn.cursor()

items = []
items.append((1, "name", 123123))
# append more...

execute_values(cur, insert_sql, items, value_template)
conn.commit()

回复收藏 0 原文

雅心素梦 2024-08-29 02:27:00

您可以使用新的upsert库：（

$ pip install upsert

您可能必须pip installdecorator 首先）

conn = psycopg2.connect('dbname=mydatabase')
cur = conn.cursor()
upsert = Upsert(cur, 'mytable')
for (selector, setter) in myrecords:
    upsert.row(selector, setter)

其中 selector 是一个 dict 对象，例如 {'name': 'Chris Smith'} 和 setter > 是一个类似于 { 'age': 28, 'state': 'WI' } 的 dict

它几乎与编写自定义 INSERT[ 一样快/UPDATE] 代码并直接使用 psycopg2 运行它...如果该行已经存在，它就不会爆炸。

You could use a new upsert library:

$ pip install upsert

(you may have to pip install decorator first)

conn = psycopg2.connect('dbname=mydatabase')
cur = conn.cursor()
upsert = Upsert(cur, 'mytable')
for (selector, setter) in myrecords:
    upsert.row(selector, setter)

Where selector is a dict object like {'name': 'Chris Smith'} and setter is a dict like { 'age': 28, 'state': 'WI' }

It's almost as fast as writing custom INSERT[/UPDATE] code and running it directly with psycopg2... and it won't blow up if the row already exists.

回复收藏 0 原文

忘羡 2024-08-29 02:27:00

使用 SQLalchemy 的任何人都可以尝试 1.2 版本，该版本添加了对批量插入的支持，以在使用 use_batch_mode=True 初始化引擎时使用 psycopg2.extras.execute_batch() 而不是 executemany，例如：

engine = create_engine(
    "postgresql+psycopg2://scott:tiger@host/dbname",
    use_batch_mode=True)

http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109

然后有人必须使用 SQLalchmey，他们不会费心尝试 sqla 和 psycopg2 的不同组合以及直接 SQL 在一起。

Anyone using SQLalchemy could try 1.2 version which added support of bulk insert to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:

engine = create_engine(
    "postgresql+psycopg2://scott:tiger@host/dbname",
    use_batch_mode=True)

http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109

Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together.

回复收藏 0 原文

赤濁 2024-08-29 02:27:00

经过一些测试，取消嵌套经常似乎是一个非常快的选择，正如我从@Clodoaldo Neto的答案中了解到的类似的问题。

data = [(1, 100), (2, 200), ...]  # list of tuples

cur.execute("""CREATE TABLE table1 AS
               SELECT u.id, u.var1
               FROM unnest(%s) u(id INT, var1 INT)""", (data,))

然而，对于极大的数据可能会很棘手。

After some testing, unnest often seems to be an extremely fast option, as I learned from @Clodoaldo Neto's answer to a similar question.

data = [(1, 100), (2, 200), ...]  # list of tuples

cur.execute("""CREATE TABLE table1 AS
               SELECT u.id, u.var1
               FROM unnest(%s) u(id INT, var1 INT)""", (data,))

However, it can be tricky with extremely large data.

回复收藏 0 原文