Postgres 按性能分区顺序
我正在使用分区的 postgres 表,遵循 文档 使用规则,使用基于日期范围的分区方案(我的日期列是纪元整数)
问题是,选择具有分片列最大值的行的简单查询没有使用索引:
首先,一些设置强制 postgres做我想做的事: 设置约束排除=开; 设置enable_seqscan = 关闭;
对单个分区的查询有效:
explain (SELECT * FROM urls_0 ORDER BY date_created ASC LIMIT 1);
Limit (cost=0.00..0.05 rows=1 width=38)
-> Index Scan using urls_date_created_idx_0 on urls_0 (cost=0.00..436.68 rows=8099 width=38)
但是,对整个表的相同查询是 seq 扫描:
explain (SELECT * FROM urls ORDER BY date_created ASC LIMIT 1);
Limit (cost=50000000274.88..50000000274.89 rows=1 width=51)
-> Sort (cost=50000000274.88..50000000302.03 rows=10859 width=51)
Sort Key: public.urls.date_created
-> Result (cost=10000000000.00..50000000220.59 rows=10859 width=51)
-> Append (cost=10000000000.00..50000000220.59 rows=10859 width=51)
-> Seq Scan on urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_15133 urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_15132 urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_15131 urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_0 urls (cost=10000000000.00..10000000152.99 rows=8099 width=38)
最后,通过 date_created 进行的查找确实可以在约束排除和索引扫描的情况下正常工作:
explain (SELECT * FROM urls where date_created = 1212)
Result (cost=10000000000.00..10000000052.75 rows=23 width=45)
-> Append (cost=10000000000.00..10000000052.75 rows=23 width=45)
-> Seq Scan on urls (cost=10000000000.00..10000000018.62 rows=3 width=88)
Filter: (date_created = 1212)
-> Index Scan using urls_date_created_idx_0 on urls_0 urls (cost=0.00..34.12 rows=20 width=38)
Index Cond: (date_created = 1212)
有谁知道如何使用分区以便这种类型的查询会使用索引扫描吗?
I'm using a partitioned postgres table following the documentation using rules, using a partitioning scheme based on date ranges (my date column is an epoch integer)
The problem is that a simple query to select the row with the maximum value of the sharded column is not using indices:
First, some settings to coerce postgres to do what I want:
SET constraint_exclusion = on;
SET enable_seqscan = off;
The query on a single partition works:
explain (SELECT * FROM urls_0 ORDER BY date_created ASC LIMIT 1);
Limit (cost=0.00..0.05 rows=1 width=38)
-> Index Scan using urls_date_created_idx_0 on urls_0 (cost=0.00..436.68 rows=8099 width=38)
However, the same query on the entire table is seq scanning:
explain (SELECT * FROM urls ORDER BY date_created ASC LIMIT 1);
Limit (cost=50000000274.88..50000000274.89 rows=1 width=51)
-> Sort (cost=50000000274.88..50000000302.03 rows=10859 width=51)
Sort Key: public.urls.date_created
-> Result (cost=10000000000.00..50000000220.59 rows=10859 width=51)
-> Append (cost=10000000000.00..50000000220.59 rows=10859 width=51)
-> Seq Scan on urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_15133 urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_15132 urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_15131 urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_0 urls (cost=10000000000.00..10000000152.99 rows=8099 width=38)
Finally, a lookup by date_created does work correctly with contraint exclusions and index scans:
explain (SELECT * FROM urls where date_created = 1212)
Result (cost=10000000000.00..10000000052.75 rows=23 width=45)
-> Append (cost=10000000000.00..10000000052.75 rows=23 width=45)
-> Seq Scan on urls (cost=10000000000.00..10000000018.62 rows=3 width=88)
Filter: (date_created = 1212)
-> Index Scan using urls_date_created_idx_0 on urls_0 urls (cost=0.00..34.12 rows=20 width=38)
Index Cond: (date_created = 1212)
Does anyone know how to use partitioning so that this type of query will use an index scan?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Postgresql 9.1 知道如何开箱即用地对此进行优化。
在 9.0 或更早版本中,您需要手动分解查询,方法是将每个子查询分别与其自己的 order by/limit 语句联合起来。
Postgresql 9.1 knows how to optimize this out of the box.
In 9.0 or earlier, you need to decompose the query manually, by unioning each of the subqueries individually with their own order by/limit statement.