PostgreSQL 临时表

发布于 2024-07-13 00:25:50 字数 282 浏览 11 评论 0原文

我需要执行 250 万次查询。 此查询生成一些我需要 AVG(column) 的行,然后使用此 AVG 从表中过滤出所有低于平均值的值。 然后我需要将这些过滤结果插入到表中。

以合理的效率完成此类操作的唯一方法似乎是为每个查询邮局管理员 python 线程创建一个临时表。 我只是希望这些临时表不会被持久化到硬盘(根本),而是保留在内存(RAM)中,当然,除非它们超出了工作内存。

我想知道临时表是否会引起磁盘写入(这会干扰插入,即整个过程变慢)

I need to perform a query 2.5 million times. This query generates some rows which I need to AVG(column) and then use this AVG to filter the table from all values below average. I then need to INSERT these filtered results into a table.

The only way to do such a thing with reasonable efficiency, seems to be by creating a TEMPORARY TABLE for each query-postmaster python-thread. I am just hoping these TEMPORARY TABLEs will not be persisted to hard drive (at all) and will remain in memory (RAM), unless they are out of working memory, of course.

I would like to know if a TEMPORARY TABLE will incur disk writes (which would interfere with the INSERTS, i.e. slow to whole process down)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

傲鸠 2024-07-20 00:25:50

请注意,在 Postgres 中,临时表的默认行为是不会自动删除它们,并且数据在提交时保留。 请参阅ON COMMIT

但是,临时表在数据库会话结束时被删除< /a>:

临时表会在会话结束时自动删除,或者
可选地在当前事务结束时。

您必须考虑多种注意事项:

  • 如果您确实想在事务结束时显式DROP临时表,请使用CREATE TEMPORARY TABLE ... ON COMMIT 创建它DROP 语法。
  • 在存在连接池的情况下,一个数据库会话可能跨越多个客户端会话; 为了避免在 CREATE 中发生冲突,您应该在将连接返回到池之前删除临时表(例如,通过在事务内执行所有操作并使用 ON COMMIT DROP > 创建语法),根据需要(通过在任何 CREATE TEMPORARY TABLE 语句之前加上相应的DROP TABLE IF EXISTS,其中具有在事务外工作的优点,例如,如果在自动提交模式下使用连接。)
  • 在使用临时表时,在溢出到磁盘之前,内存中有多少内容可以容纳? 请参阅 temp_buffers< postgresql.conf 中的 /code>选项
  • 在经常使用临时表时我还应该担心什么吗? 建议在删除临时表后进行真空处理,以清除目录中的任何死元组。 使用默认设置 (auto_vacuum) 时,Postgres 会每 3 分钟左右自动为您清理一次。

另外,与您的问题无关(但可能与您的项目相关):请记住,如果您必须在填充临时表后对它运行查询,那么最好创建适当的索引,并在插入完毕后对相关临时表发出ANALYZE 默认情况下,基于成本的优化器将假设新创建的临时表有大约 1000 行,如果临时表实际上包含数百万行,这可能会导致性能不佳。

Please note that, in Postgres, the default behaviour for temporary tables is that they are not automatically dropped, and data is persisted on commit. See ON COMMIT.

Temporary table are, however, dropped at the end of a database session:

Temporary tables are automatically dropped at the end of a session, or
optionally at the end of the current transaction.

There are multiple considerations you have to take into account:

  • If you do want to explicitly DROP a temporary table at the end of a transaction, create it with the CREATE TEMPORARY TABLE ... ON COMMIT DROP syntax.
  • In the presence of connection pooling, a database session may span multiple client sessions; to avoid clashes in CREATE, you should drop your temporary tables -- either prior to returning a connection to the pool (e.g. by doing everything inside a transaction and using the ON COMMIT DROP creation syntax), or on an as-needed basis (by preceding any CREATE TEMPORARY TABLE statement with a corresponding DROP TABLE IF EXISTS, which has the advantage of also working outside transactions e.g. if the connection is used in auto-commit mode.)
  • While the temporary table is in use, how much of it will fit in memory before overflowing on to disk? See the temp_buffers option in postgresql.conf
  • Anything else I should worry about when working often with temp tables? A vacuum is recommended after you have DROPped temporary tables, to clean up any dead tuples from the catalog. Postgres will automatically vacuum every 3 minutes or so for you when using the default settings (auto_vacuum).

Also, unrelated to your question (but possibly related to your project): keep in mind that, if you have to run queries against a temp table after you have populated it, then it is a good idea to create appropriate indices and issue an ANALYZE on the temp table in question after you're done inserting into it. By default, the cost based optimizer will assume that a newly created the temp table has ~1000 rows and this may result in poor performance should the temp table actually contain millions of rows.

梦中楼上月下 2024-07-20 00:25:50

临时表仅提供一种保证 - 它们在会话结束时被删除。 对于小表,您可能会将大部分数据存储在后备存储中。 对于大型表,我保证数据将定期刷新到磁盘,因为数据库引擎需要更多工作空间来处理其他请求。

编辑:
如果您绝对需要仅 RAM 临时表,您可以在 RAM 磁盘上为数据库创建一个表空间(/dev/shm 有效)。 这减少了磁盘 IO 的数量,但要注意,目前在没有物理磁盘写入的情况下无法做到这一点; 当您创建临时表时,数据库引擎会将表列表刷新到稳定存储。

Temporary tables provide only one guarantee - they are dropped at the end of the session. For a small table you'll probably have most of your data in the backing store. For a large table I guarantee that data will be flushed to disk periodically as the database engine needs more working space for other requests.

EDIT:
If you're absolutely in need of RAM-only temporary tables you can create a table space for your database on a RAM disk (/dev/shm works). This reduces the amount of disk IO, but beware that it is currently not possible to do this without a physical disk write; the DB engine will flush the table list to stable storage when you create the temporary table.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文