使用 CURRENT_TIMESTAMP 的时间戳分区表的查询效率
考虑到 PostgreSQL 9.0.3 下的以下表分区:
CREATE TABLE records (
ts TIMESTAMP,
...
);
CREATE TABLE records_2010 (
CHECK (ts >= '2010-01-01 00:00:00' AND ts < '2011-01-01 00:00:00')
) INHERITS (records);
CREATE TABLE records_2011 (
CHECK (ts >= '2011-01-01 00:00:00' AND ts < '2012-01-01 00:00:00')
) INHERITS (records);
我希望以下 SELECT 查询具有相同的 EXPLAINed 计划,仅查阅“records”和“records_2011”,但它们不同:
BEGIN;
-- Assume CURRENT_TIMESTAMP is 9 a.m. on 5 March 2011
SELECT * FROM records WHERE ts >= '2011-03-05 09:00:00'; -- scans 2 tables
SELECT * FROM records WHERE ts >= CURRENT_TIMESTAMP; -- scans all 3 tables
COMMIT;
考虑到 CURRENT_TIMESTAMP 在其持续时间内返回一个常量值包含事务,为什么使用 CURRENT_TIMESTAMP 的查询不利用 Postgres 的分区并只扫描两个表?
更新:
目前这是不可能的,但它是被认为是一个需要改进的领域。 PostgreSQL 9.1 可能解决查询执行器中的此行为。
Given the following table partitioning under PostgreSQL 9.0.3:
CREATE TABLE records (
ts TIMESTAMP,
...
);
CREATE TABLE records_2010 (
CHECK (ts >= '2010-01-01 00:00:00' AND ts < '2011-01-01 00:00:00')
) INHERITS (records);
CREATE TABLE records_2011 (
CHECK (ts >= '2011-01-01 00:00:00' AND ts < '2012-01-01 00:00:00')
) INHERITS (records);
I expected the following SELECT queries to have the same EXPLAINed plan, consulting only "records" and "records_2011", but they differ:
BEGIN;
-- Assume CURRENT_TIMESTAMP is 9 a.m. on 5 March 2011
SELECT * FROM records WHERE ts >= '2011-03-05 09:00:00'; -- scans 2 tables
SELECT * FROM records WHERE ts >= CURRENT_TIMESTAMP; -- scans all 3 tables
COMMIT;
Given that CURRENT_TIMESTAMP returns a constant value for the duration of its enclosing transactions, why doesn't the query with CURRENT_TIMESTAMP take advantage of Postgres' partitioning and only scan two tables?
UPDATE:
This isn't possible right now, but it is recognized as an area to improve. PostgreSQL 9.1 may address this behavior in the query executor.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您第一次运行查询,PostgreSQL 会确定查询计划。这是一个昂贵的操作,并且结果会被缓存。因此查询也必须适用于未来的执行。
无论您何时运行第一个查询,都永远不需要
records_2010
。但第二个查询使用变量
CURRENT_TIMESTAMP
。优化器不知道时间只会增加,并生成一个适用于任何ts
值的计划。这意味着它必须查看所有三个表。If you run a query for the first time, PostgreSQL determines a query plan. This is an expensive operation and the result is cached. So the query has to work for future executions as well.
Your first query will never need
records_2010
, regardless of when you run it.But the second query uses a variable
CURRENT_TIMESTAMP
. The optimizer does not know that time can only increase, and generates a plan that will work for any value ofts
. That means it has to look in all three tables.