使用 CURRENT_TIMESTAMP 的时间戳分区表的查询效率

发布于 2024-10-20 23:22:21 字数 1007 浏览 2 评论 0原文

考虑到 PostgreSQL 9.0.3 下的以下表分区:

CREATE TABLE records (
  ts TIMESTAMP,
  ...
);

CREATE TABLE records_2010 (
  CHECK (ts >= '2010-01-01 00:00:00' AND ts < '2011-01-01 00:00:00')
) INHERITS (records);

CREATE TABLE records_2011 (
  CHECK (ts >= '2011-01-01 00:00:00' AND ts < '2012-01-01 00:00:00')
) INHERITS (records);

我希望以下 SELECT 查询具有相同的 EXPLAINed 计划,仅查阅“records”和“records_2011”,但它们不同:

BEGIN;
-- Assume CURRENT_TIMESTAMP is 9 a.m. on 5 March 2011
SELECT * FROM records WHERE ts >= '2011-03-05 09:00:00'; -- scans 2 tables
SELECT * FROM records WHERE ts >= CURRENT_TIMESTAMP;     -- scans all 3 tables
COMMIT;

考虑到 CURRENT_TIMESTAMP 在其持续时间内返回一个常量值包含事务,为什么使用 CURRENT_TIMESTAMP 的查询不利用 Postgres 的分区并只扫描两个表?

更新:

目前这是不可能的,但它是被认为是一个需要改进的领域。 PostgreSQL 9.1 可能解决查询执行器中的此行为

Given the following table partitioning under PostgreSQL 9.0.3:

CREATE TABLE records (
  ts TIMESTAMP,
  ...
);

CREATE TABLE records_2010 (
  CHECK (ts >= '2010-01-01 00:00:00' AND ts < '2011-01-01 00:00:00')
) INHERITS (records);

CREATE TABLE records_2011 (
  CHECK (ts >= '2011-01-01 00:00:00' AND ts < '2012-01-01 00:00:00')
) INHERITS (records);

I expected the following SELECT queries to have the same EXPLAINed plan, consulting only "records" and "records_2011", but they differ:

BEGIN;
-- Assume CURRENT_TIMESTAMP is 9 a.m. on 5 March 2011
SELECT * FROM records WHERE ts >= '2011-03-05 09:00:00'; -- scans 2 tables
SELECT * FROM records WHERE ts >= CURRENT_TIMESTAMP;     -- scans all 3 tables
COMMIT;

Given that CURRENT_TIMESTAMP returns a constant value for the duration of its enclosing transactions, why doesn't the query with CURRENT_TIMESTAMP take advantage of Postgres' partitioning and only scan two tables?

UPDATE:

This isn't possible right now, but it is recognized as an area to improve. PostgreSQL 9.1 may address this behavior in the query executor.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

风苍溪 2024-10-27 23:22:21

如果您第一次运行查询,PostgreSQL 会确定查询计划。这是一个昂贵的操作,并且结果会被缓存。因此查询也必须适用于未来的执行。

无论您何时运行第一个查询,都永远不需要 records_2010

但第二个查询使用变量CURRENT_TIMESTAMP。优化器不知道时间只会增加,并生成一个适用于任何 ts 值的计划。这意味着它必须查看所有三个表。

If you run a query for the first time, PostgreSQL determines a query plan. This is an expensive operation and the result is cached. So the query has to work for future executions as well.

Your first query will never need records_2010, regardless of when you run it.

But the second query uses a variable CURRENT_TIMESTAMP. The optimizer does not know that time can only increase, and generates a plan that will work for any value of ts. That means it has to look in all three tables.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文