psycopg2：如果内存不足，PostgreSQL 是否会在磁盘上存储表的副本

发布于 2024-10-27 21:50:02 字数 489 浏览 10 评论 0原文

我在具有 2 GB 内存的计算机上对 4.89 亿行 (102 GB) 运行以下查询：

select * from table order by x, y, z, h, j, l;

我使用带有服务器游标（“cursor_unique_name”）的 psycopg2 并一次获取 30000 行。

显然，查询的结果不能保留在内存中，但我的问题是以下一组查询是否会同样快：

select * into temp_table from table order by x, y, z, h, j, l;
select * from temp_table

这意味着我将使用 temp_table 来存储排序结果并从该表中获取数据。

提出这个问题的原因是，如果使用 psql 手动运行，只需 36 分钟即可完成，但使用 psycopg2 执行查询时，需要 8 个多小时（从未完成）来获取前 30000 行。

原文

I am running the following query on 489 million rows (102 gb) on a computer with 2 gb of memory:

select * from table order by x, y, z, h, j, l;

I am using psycopg2 with a server cursor ("cursor_unique_name") and fetch 30000 rows at a time.

Obviously the result of the query cannot stay in memory, but my question is whether the following set of queries would be just as fast:

select * into temp_table from table order by x, y, z, h, j, l;
select * from temp_table

This means that I would use a temp_table to store the ordered result and fetch data from that table instead.

The reason for asking this question is that the takes only 36 minutes to complete if run manually using psql, but it took more than 8 hours (never finished) to fetch the first 30000 rows when the query was executed using psycopg2.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

零度° 2024-11-03 21:50:02

如果你想按块获取该表并进行排序，那么你需要创建一个索引。如果没有这样的索引，则每次获取都需要对整个表进行排序。您的光标可能会为获取的每一行对该表进行一次排序 - 等待红巨星可能会更快结束......
create index tablename_order_idx on tablename (x, y, z, h, j, l);
如果你的表数据相对稳定那么你应该集群 通过此索引。这样，无需在磁盘上进行太多查找即可获取表数据。
cluster tablename using tablename_order_idx;
如果你想以块的形式获取数据，你不应该使用游标，因为它总是一次一行。您应该使用 limit 和 offset< /代码>：按 x、y、z、h、j、l 顺序从表名中选择 * limit 30000 offset 44*30000