在子查询中使用 unnest() 时,Postgresql 分区修剪不起作用
在子查询中使用 unnest() 时,Postgresql (13.4) 无法提出使用执行时分区修剪的查询计划。
给定这些表:
CREATE TABLE users (
user_id uuid,
channel_id uuid,
CONSTRAINT user_pk PRIMARY KEY(user_id, channel_id)
)
PARTITION BY hash(user_id);
CREATE TABLE users_0 PARTITION OF users FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE users_1 PARTITION OF users FOR VALUES WITH (MODULUS 2, REMAINDER 1);
CREATE TABLE channels (
channel_id uuid,
user_ids uuid[],
CONSTRAINT channel_pk PRIMARY KEY(channel_id)
) PARTITION BY hash(channel_id);
CREATE TABLE channels_0 partition of channels FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE channels_1 partition of channels FOR VALUES WITH (MODULUS 2, REMAINDER 1);
插入一些数据:
INSERT INTO users(user_id, channel_id) VALUES('0861180b-c972-42fe-9fb3-3b55e652f893', '45205876-7270-4e06-ab8d-b5f669298422');
INSERT INTO channels(channel_id, user_ids) VALUES('45205876-7270-4e06-ab8d-b5f669298422', '{0861180b-c972-42fe-9fb3-3b55e652f893}');
INSERT INTO users
SELECT
gen_random_uuid() as user_id,
gen_random_uuid() as channel_id
FROM generate_series(1, 100);
INSERT INTO channels
SELECT
(SELECT max(channel_id::text) FROM (SELECT channel_id FROM users ORDER BY random()*generate_series LIMIT 1) c)::uuid as channel_id,
(SELECT array_agg(DISTINCT user_id::text) FROM (SELECT user_id FROM users ORDER BY random()*generate_series
LIMIT 1) u)::uuid[] as user_ids
FROM (SELECT * FROM generate_series(1, 100)) g
ON conflict DO NOTHING;
以下查询:
EXPLAIN ANALYZE
SELECT * FROM users
WHERE user_id IN (
SELECT unnest(user_ids) FROM channels WHERE channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
)
AND channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
返回扫描所有分区的查询计划。
Hash Semi Join (cost=8.45..37.28 rows=8 width=32) (actual time=0.208..0.387 rows=1 loops=1)
Hash Cond: (users.user_id = (unnest(channels.user_ids)))
-> Append (cost=0.00..28.71 rows=8 width=32) (actual time=0.037..0.134 rows=1 loops=1)
-> Seq Scan on users_0 users_1 (cost=0.00..27.00 rows=7 width=32) (actual time=0.021..0.041 rows=1 loops=1)
Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Rows Removed by Filter: 45
-> Seq Scan on users_1 users_2 (cost=0.00..1.68 rows=1 width=32) (actual time=0.018..0.027 rows=0 loops=1)
Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Rows Removed by Filter: 54
-> Hash (cost=8.33..8.33 rows=10 width=16) (actual time=0.131..0.172 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> ProjectSet (cost=0.15..8.23 rows=10 width=16) (actual time=0.060..0.114 rows=1 loops=1)
-> Index Scan using channels_0_pkey on channels_0 channels (cost=0.15..8.17 rows=1 width=32) (actual time=0.040..0.059 rows=1 loops=1)
Index Cond: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Planning Time: 0.363 ms
Execution Time: 0.515 ms
我希望 Postgresql 运行子查询并查看返回的 user_id 以找出该数据将位于哪些分区。但是,Postgresql 正在查找该数据的所有分区。我尝试在通道表中使用一行 pr user_id ,这非常有效。
EXPLAIN ANALYZE
SELECT * FROM users
WHERE user_id IN (
SELECT user_id FROM channels WHERE channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
)
AND channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
然后,Postgresql 不会对无法容纳任何数据的分区运行任何步骤。
看来 unnest() 导致执行时间分区修剪不起作用。这是为什么?
解决方案: 我可以确认 jjanes 的解决方案。通过向表中添加 10 万行,查询可使用 unnest() 在执行时进行分区修剪。
Postgresql (13.4) is not able to come up with a query plan that uses execution-time partition pruning when using unnest() in a subquery.
Given these tables:
CREATE TABLE users (
user_id uuid,
channel_id uuid,
CONSTRAINT user_pk PRIMARY KEY(user_id, channel_id)
)
PARTITION BY hash(user_id);
CREATE TABLE users_0 PARTITION OF users FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE users_1 PARTITION OF users FOR VALUES WITH (MODULUS 2, REMAINDER 1);
CREATE TABLE channels (
channel_id uuid,
user_ids uuid[],
CONSTRAINT channel_pk PRIMARY KEY(channel_id)
) PARTITION BY hash(channel_id);
CREATE TABLE channels_0 partition of channels FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE channels_1 partition of channels FOR VALUES WITH (MODULUS 2, REMAINDER 1);
Insert some data:
INSERT INTO users(user_id, channel_id) VALUES('0861180b-c972-42fe-9fb3-3b55e652f893', '45205876-7270-4e06-ab8d-b5f669298422');
INSERT INTO channels(channel_id, user_ids) VALUES('45205876-7270-4e06-ab8d-b5f669298422', '{0861180b-c972-42fe-9fb3-3b55e652f893}');
INSERT INTO users
SELECT
gen_random_uuid() as user_id,
gen_random_uuid() as channel_id
FROM generate_series(1, 100);
INSERT INTO channels
SELECT
(SELECT max(channel_id::text) FROM (SELECT channel_id FROM users ORDER BY random()*generate_series LIMIT 1) c)::uuid as channel_id,
(SELECT array_agg(DISTINCT user_id::text) FROM (SELECT user_id FROM users ORDER BY random()*generate_series
LIMIT 1) u)::uuid[] as user_ids
FROM (SELECT * FROM generate_series(1, 100)) g
ON conflict DO NOTHING;
The following query:
EXPLAIN ANALYZE
SELECT * FROM users
WHERE user_id IN (
SELECT unnest(user_ids) FROM channels WHERE channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
)
AND channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
returns a query plan scanning all partitions.
Hash Semi Join (cost=8.45..37.28 rows=8 width=32) (actual time=0.208..0.387 rows=1 loops=1)
Hash Cond: (users.user_id = (unnest(channels.user_ids)))
-> Append (cost=0.00..28.71 rows=8 width=32) (actual time=0.037..0.134 rows=1 loops=1)
-> Seq Scan on users_0 users_1 (cost=0.00..27.00 rows=7 width=32) (actual time=0.021..0.041 rows=1 loops=1)
Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Rows Removed by Filter: 45
-> Seq Scan on users_1 users_2 (cost=0.00..1.68 rows=1 width=32) (actual time=0.018..0.027 rows=0 loops=1)
Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Rows Removed by Filter: 54
-> Hash (cost=8.33..8.33 rows=10 width=16) (actual time=0.131..0.172 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> ProjectSet (cost=0.15..8.23 rows=10 width=16) (actual time=0.060..0.114 rows=1 loops=1)
-> Index Scan using channels_0_pkey on channels_0 channels (cost=0.15..8.17 rows=1 width=32) (actual time=0.040..0.059 rows=1 loops=1)
Index Cond: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Planning Time: 0.363 ms
Execution Time: 0.515 ms
I would expect Postgresql to run the subquery and look at the user_id's returned to figure out what partitions this data will be in. However, Postgresql is looking into all partitions for this data. I have tried using one row pr user_id in the channels table, this works perfect.
EXPLAIN ANALYZE
SELECT * FROM users
WHERE user_id IN (
SELECT user_id FROM channels WHERE channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
)
AND channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
Postgresql then does not run any of the steps for partitions that cannot hold any data.
It seems unnest() is causing execution time partition pruning to not work. Why is that?
SOLUTION:
I can confirm jjanes's solution. By adding 100k rows to the tables the query, with unnest(), does partition pruning at execution-time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的示例仅显示了它对一个查询的用途,而不是它能够执行的所有操作。
你的桌子小得可笑。另外将 10,000 行放入 users 表中,这样索引实际上很重要,看看它有什么作用。
(从未执行)
是由于执行时修剪造成的。Your example only shows what it did use for one query, not everything it is capable of doing.
Your tables are laughably small. Put another 10,000 rows into the users table, so that the index is actually important, and see what it does.
The
(never executed)
is due to execution-time pruning.