PostgreSQL 临时表
我需要执行 250 万次查询。 此查询生成一些我需要 AVG(column)
的行,然后使用此 AVG
从表中过滤出所有低于平均值的值。 然后我需要将这些过滤结果插入到表中。
以合理的效率完成此类操作的唯一方法似乎是为每个查询邮局管理员 python 线程创建一个临时表。 我只是希望这些临时表不会被持久化到硬盘(根本),而是保留在内存(RAM)中,当然,除非它们超出了工作内存。
我想知道临时表是否会引起磁盘写入(这会干扰插入,即整个过程变慢)
I need to perform a query 2.5 million times. This query generates some rows which I need to AVG(column)
and then use this AVG
to filter the table from all values below average. I then need to INSERT
these filtered results into a table.
The only way to do such a thing with reasonable efficiency, seems to be by creating a TEMPORARY TABLE
for each query-postmaster python-thread. I am just hoping these TEMPORARY TABLE
s will not be persisted to hard drive (at all) and will remain in memory (RAM), unless they are out of working memory, of course.
I would like to know if a TEMPORARY TABLE will incur disk writes (which would interfere with the INSERTS, i.e. slow to whole process down)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
请注意,在 Postgres 中,临时表的默认行为是不会自动删除它们,并且数据在提交时保留。 请参阅
ON COMMIT
。但是,临时表在数据库会话结束时被删除< /a>:
您必须考虑多种注意事项:
DROP
临时表,请使用CREATE TEMPORARY TABLE ... ON COMMIT 创建它DROP
语法。CREATE TEMPORARY TABLE
语句之前加上相应的DROP TABLE IF EXISTS
,其中具有在事务外工作的优点,例如,如果在自动提交模式下使用连接。)temp_buffers<
postgresql.conf
中的 /code>选项
auto_vacuum
) 时,Postgres 会每 3 分钟左右自动为您清理一次。另外,与您的问题无关(但可能与您的项目相关):请记住,如果您必须在填充临时表后对它运行查询,那么最好创建适当的索引,并在插入完毕后对相关临时表发出
ANALYZE
。 默认情况下,基于成本的优化器将假设新创建的临时表有大约 1000 行,如果临时表实际上包含数百万行,这可能会导致性能不佳。Please note that, in Postgres, the default behaviour for temporary tables is that they are not automatically dropped, and data is persisted on commit. See
ON COMMIT
.Temporary table are, however, dropped at the end of a database session:
There are multiple considerations you have to take into account:
DROP
a temporary table at the end of a transaction, create it with theCREATE TEMPORARY TABLE ... ON COMMIT DROP
syntax.CREATE
, you should drop your temporary tables -- either prior to returning a connection to the pool (e.g. by doing everything inside a transaction and using theON COMMIT DROP
creation syntax), or on an as-needed basis (by preceding anyCREATE TEMPORARY TABLE
statement with a correspondingDROP TABLE IF EXISTS
, which has the advantage of also working outside transactions e.g. if the connection is used in auto-commit mode.)temp_buffers
option inpostgresql.conf
auto_vacuum
).Also, unrelated to your question (but possibly related to your project): keep in mind that, if you have to run queries against a temp table after you have populated it, then it is a good idea to create appropriate indices and issue an
ANALYZE
on the temp table in question after you're done inserting into it. By default, the cost based optimizer will assume that a newly created the temp table has ~1000 rows and this may result in poor performance should the temp table actually contain millions of rows.临时表仅提供一种保证 - 它们在会话结束时被删除。 对于小表,您可能会将大部分数据存储在后备存储中。 对于大型表,我保证数据将定期刷新到磁盘,因为数据库引擎需要更多工作空间来处理其他请求。
编辑:
如果您绝对需要仅 RAM 临时表,您可以在 RAM 磁盘上为数据库创建一个表空间(/dev/shm 有效)。 这减少了磁盘 IO 的数量,但要注意,目前在没有物理磁盘写入的情况下无法做到这一点; 当您创建临时表时,数据库引擎会将表列表刷新到稳定存储。
Temporary tables provide only one guarantee - they are dropped at the end of the session. For a small table you'll probably have most of your data in the backing store. For a large table I guarantee that data will be flushed to disk periodically as the database engine needs more working space for other requests.
EDIT:
If you're absolutely in need of RAM-only temporary tables you can create a table space for your database on a RAM disk (/dev/shm works). This reduces the amount of disk IO, but beware that it is currently not possible to do this without a physical disk write; the DB engine will flush the table list to stable storage when you create the temporary table.