使用 CURRENT_TIMESTAMP 的时间戳分区表的查询效率
考虑到 PostgreSQL 9.0.3 下的以下表分区:
CREATE TABLE records (
ts TIMESTAMP,
...
);
CREATE TABLE records_2010 (
CHECK (ts >= '2010-01-01 00:00:00' AND ts < '2011-01-01 00:00:00')
) INHERITS (records);
CREATE TABLE records_2011 (
CHECK (ts >= '2011-01-01 00:00:00' AND ts < '2012-01-01 00:00:00')
) INHERITS (records);
我希望以下 SELECT 查询具有相同的 EXPLAINed 计划,仅查阅“records”和“records_2011”,但它们不同:
BEGIN;
-- Assume CURRENT_TIMESTAMP is 9 a.m. on 5 March 2011
SELECT * FROM records WHERE ts >= '2011-03-05 09:00:00'; -- scans 2 tables
SELECT * FROM records WHERE ts >= CURRENT_TIMESTAMP; -- scans all 3 tables
COMMIT;
考虑到 CURRENT_TIMESTAMP 在其持续时间内返回一个常量值包含事务,为什么使用 CURRENT_TIMESTAMP 的查询不利用 Postgres 的分区并只扫描两个表?
更新:
目前这是不可能的,但它是被认为是一个需要改进的领域。 PostgreSQL 9.1 可能解决查询执行器中的此行为。
Given the following table partitioning under PostgreSQL 9.0.3:
CREATE TABLE records (
ts TIMESTAMP,
...
);
CREATE TABLE records_2010 (
CHECK (ts >= '2010-01-01 00:00:00' AND ts < '2011-01-01 00:00:00')
) INHERITS (records);
CREATE TABLE records_2011 (
CHECK (ts >= '2011-01-01 00:00:00' AND ts < '2012-01-01 00:00:00')
) INHERITS (records);
I expected the following SELECT queries to have the same EXPLAINed plan, consulting only "records" and "records_2011", but they differ:
BEGIN;
-- Assume CURRENT_TIMESTAMP is 9 a.m. on 5 March 2011
SELECT * FROM records WHERE ts >= '2011-03-05 09:00:00'; -- scans 2 tables
SELECT * FROM records WHERE ts >= CURRENT_TIMESTAMP; -- scans all 3 tables
COMMIT;
Given that CURRENT_TIMESTAMP returns a constant value for the duration of its enclosing transactions, why doesn't the query with CURRENT_TIMESTAMP take advantage of Postgres' partitioning and only scan two tables?
UPDATE:
This isn't possible right now, but it is recognized as an area to improve. PostgreSQL 9.1 may address this behavior in the query executor.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您第一次运行查询,PostgreSQL 会确定查询计划。这是一个昂贵的操作,并且结果会被缓存。因此查询也必须适用于未来的执行。
无论您何时运行第一个查询,都永远不需要
records_2010
。但第二个查询使用变量
CURRENT_TIMESTAMP
。优化器不知道时间只会增加,并生成一个适用于任何ts
值的计划。这意味着它必须查看所有三个表。If you run a query for the first time, PostgreSQL determines a query plan. This is an expensive operation and the result is cached. So the query has to work for future executions as well.
Your first query will never need
records_2010
, regardless of when you run it.But the second query uses a variable
CURRENT_TIMESTAMP
. The optimizer does not know that time can only increase, and generates a plan that will work for any value ofts
. That means it has to look in all three tables.