多分区Postgres表的高效查询

发布于 2024-08-21 03:03:05 字数 1092 浏览 3 评论 0原文

我刚刚重组了我的数据库以在 Postgres 8.2 中使用 分区 。现在我遇到了查询性能的问题:

SELECT *
FROM my_table
WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
ORDER BY id DESC
LIMIT 100;

表中有 4500 万行。在分区之前,这将使用反向索引扫描,并在达到限制时立即停止。

分区后(在 time_stamp 范围内),Postgres 对主表和相关分区进行完整索引扫描并合并结果,对它们进行排序,然后应用限制。这需要太长的时间。

我可以用以下方法修复它:

SELECT * FROM (
  SELECT *
  FROM my_table_part_a
  WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
  ORDER BY id DESC
  LIMIT 100) t
UNION ALL
SELECT * FROM (
  SELECT *
  FROM my_table_part_b
  WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
  ORDER BY id DESC
  LIMIT 100) t
UNION ALL
  ... and so on ...
ORDER BY id DESC
LIMIT 100

这运行得很快。时间戳超出范围的分区甚至不包含在查询计划中。

我的问题是:在 Postgres 8.2 中是否可以使用一些提示或语法来防止查询规划器扫描整个表,但仍然使用仅引用主表的简单语法?

基本上,我可以避免在当前定义的每个分区上动态构建大型 UNION 查询的痛苦吗?

编辑:我启用了constraint_exclusion(感谢@Vinko Vrsalovic)

I've just restructured my database to use partitioning in Postgres 8.2. Now I have a problem with query performance:

SELECT *
FROM my_table
WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
ORDER BY id DESC
LIMIT 100;

There are 45 million rows in the table. Prior to partitioning, this would use a reverse index scan and stop as soon as it hit the limit.

After partitioning (on time_stamp ranges), Postgres does a full index scan of the master table and the relevant partition and merges the results, sorts them, then applies the limit. This takes way too long.

I can fix it with:

SELECT * FROM (
  SELECT *
  FROM my_table_part_a
  WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
  ORDER BY id DESC
  LIMIT 100) t
UNION ALL
SELECT * FROM (
  SELECT *
  FROM my_table_part_b
  WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
  ORDER BY id DESC
  LIMIT 100) t
UNION ALL
  ... and so on ...
ORDER BY id DESC
LIMIT 100

This runs quickly. The partitions where the times-stamps are out-of-range aren't even included in the query plan.

My question is: Is there some hint or syntax I can use in Postgres 8.2 to prevent the query-planner from scanning the full table but still using simple syntax that only refers to the master table?

Basically, can I avoid the pain of dynamically building the big UNION query over each partition that happens to be currently defined?

EDIT: I have constraint_exclusion enabled (thanks @Vinko Vrsalovic)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

原谅我要高飞 2024-08-28 03:03:05

您是否尝试过约束排除(您链接到的文档中的第 5.9.4 节)

约束排除是一个查询
改进的优化技术
分区表的性能
以所描述的方式定义
多于。举个例子:

 SET constraint_exclusion = on; 
 SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01'; 

没有
约束排除,上面的查询
会扫描每个分区
测量表。有约束
启用排除后,规划器将
检查每个的约束条件
并尝试证明
不需要扫描分区,因为
它不能包含任何行会议
查询的 WHERE 子句。当
planner可以证明这一点,它排除了
查询计划中的分区。

您可以使用 EXPLAIN 命令来
显示计划之间的差异
与constraint_exclusion打开和a
计划关闭它。

Have you tried Constraint Exclusion (section 5.9.4 in the document you've linked to)

Constraint exclusion is a query
optimization technique that improves
performance for partitioned tables
defined in the fashion described
above. As an example:

 SET constraint_exclusion = on; 
 SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01'; 

Without
constraint exclusion, the above query
would scan each of the partitions of
the measurement table. With constraint
exclusion enabled, the planner will
examine the constraints of each
partition and try to prove that the
partition need not be scanned because
it could not contain any rows meeting
the query's WHERE clause. When the
planner can prove this, it excludes
the partition from the query plan.

You can use the EXPLAIN command to
show the difference between a plan
with constraint_exclusion on and a
plan with it off.

踏月而来 2024-08-28 03:03:05

我遇到了类似的问题,可以通过在 WHERE 中转换条件来解决。
EG:(假设time_stamp列是timestamptz类型)

WHERE time_stamp >= '2010-02-10'::timestamptz and time_stamp < '2010-02-11'::timestamptz

另外,确保表上的CHECK条件以相同的方式定义......
例如:
检查(时间戳<'2010-02-10'::timestamptz)

I had a similar problem that I was able fix by casting conditions in WHERE.
EG: (assuming the time_stamp column is timestamptz type)

WHERE time_stamp >= '2010-02-10'::timestamptz and time_stamp < '2010-02-11'::timestamptz

Also, make sure the CHECK condition on the table is defined the same way...
EG:
CHECK (time_stamp < '2010-02-10'::timestamptz)

只为守护你 2024-08-28 03:03:05

我遇到了同样的问题,在我的情况下,它归结为两个原因:

  1. 我索引了 timestamp with time zone 类型的列,并通过该列的 timestamp Without time 类型进行了分区约束

  2. 修复所有子表的约束后,需要ANALYZE

编辑:另一点知识 - 重要的是要记住约束排除(允许 PG 根据您的分区标准跳过扫描某些表)不起作用,引用:non-immutable诸如 CURRENT_TIMESTAMP 之类的函数

我收到了 CURRENT_DATE 的请求,这是我的问题的一部分。

I had the same problem and it boiled down to two reasons in my case:

  1. I had indexed column of type timestamp WITH time zone and partition constraint by this column with type timestamp WITHOUT time zone.

  2. After fixing constraints ANALYZE of all child tables was needed.

Edit: another bit of knowledge - it's important to remember that constraint exclusion (which allows PG to skip scanning some tables based on your partitioning criteria) doesn't work with, quote: non-immutable function such as CURRENT_TIMESTAMP

I had requests with CURRENT_DATE and it was part of my problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文