当前位置：文江博客话题详情

为什么 Postgres 中的坏行估计速度很慢？

发布于 2025-01-11 02:12:31 字数 152 浏览 0 评论 0原文

是什么让错误的行估计成为 SQL 查询性能的痛点？我很想知道其内部原因。

通常，错误的行估计实际上会选择正确的计划，而好查询和坏查询之间的唯一区别是估计的行数。

为什么经常出现如此巨大的性能差异？

是因为 Postgres 使用行估计来分配内存吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

云巢 2025-01-18 02:12:31

行计数估计会影响优化器的进一步决策，因此错误的估计可能会导致错误的计划。

根据我的经验，问题通常发生在决定正确的连接策略：

当行数被低估时，PostgreSQL 可能会选择嵌套循环连接而不是散列或合并连接，但内表最终会比 PostgreSLQ 想象的更频繁地被扫描，从而导致坏的性能。
相反，如果 PostgreSQL 高估了行数，它可能会选择散列或合并联接并完全扫描两个表，这可能比在内表上进行几次索引扫描要慢得多。

回复收藏 0 原文

樱花落人离去 2025-01-18 02:12:31

Postgresql优化器是一个基于成本的优化器（CBO），查询将按照执行计划中最小成本来执行，并且成本将通过表的统计来计算。

为什么 Postgres 中的坏行估计速度很慢？

因为错误的统计可能会选择错误的执行计划。这是一个示例

有两个表，T1 有 20000000 行，T2 有 1000000 行。

CREATE TABLE T1 (
    ID INT NOT NULL PRIMARY KEY,
    val INT NOT NULL,
    col1 UUID NOT NULL,
    col2 UUID NOT NULL,
    col3 UUID NOT NULL,
    col4 UUID NOT NULL,
    col5 UUID NOT NULL,
    col6 UUID NOT NULL
);


INSERT INTO T1
SELECT i,
       RANDOM() * 1000000,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid
FROM generate_series(1,20000000) i;


CREATE TABLE T2 (
    ID INT NOT NULL PRIMARY KEY,
    val INT NOT NULL,
    col1 UUID NOT NULL,
    col2 UUID NOT NULL,
    col3 UUID NOT NULL,
    col4 UUID NOT NULL,
    col5 UUID NOT NULL,
    col6 UUID NOT NULL
);

INSERT INTO T2
SELECT i,
       RANDOM() * 1000000,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid
FROM generate_series(1,1000000) i;

当我们在表上进行 join 时，我们将得到一个可能使用 Merge JOIN 的执行计划，

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT t1.*
FROM T1 
INNER JOIN T2 ON t1.id = t2.id 
WHERE t1.id < 1000000

"Gather  (cost=1016.37..30569.85 rows=53968 width=104) (actual time=0.278..837.297 rows=999999 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  Buffers: shared hit=38273 read=21841"
"  ->  Merge Join  (cost=16.37..24173.05 rows=22487 width=104) (actual time=11.993..662.770 rows=333333 loops=3)"
"        Merge Cond: (t2.id = t1.id)"
"        Buffers: shared hit=38273 read=21841"
"        ->  Parallel Index Only Scan using t2_pkey on t2  (cost=0.42..20147.09 rows=416667 width=4) (actual time=0.041..69.947 rows=333333 loops=3)"
"              Heap Fetches: 0"
"              Buffers: shared hit=6 read=2732"
"        ->  Index Scan using t1_pkey on t1  (cost=0.44..48427.24 rows=1079360 width=104) (actual time=0.041..329.874 rows=999819 loops=3)"
"              Index Cond: (id < 1000000)"
"              Buffers: shared hit=38267 read=19109"
"Planning:"
"  Buffers: shared hit=4 read=8"
"Planning Time: 0.228 ms"
"Execution Time: 906.760 ms"

但是当我更新很多行时，如下所示让 id 加 100000000 code> 当 id 小于 1000000 时

update T1
set id = id + 100000000
where id < 1000000

，我们再次使用相同的查询，它将使用 Merge JOIN，但应该有另一个更好的选择而不是 Merge加入。

如果您没有达到 autovacuum_analyze_threshold（autovacuum_analyze_threshold 默认值为 0.1，这意味着我们需要创建超过 10% deadtuple postgresql 将更新统计信息自动）

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT t1.*
FROM T1 
INNER JOIN T2 ON t1.id = t2.id 
WHERE t1.id < 1000000

"Gather  (cost=1016.37..30707.83 rows=53968 width=104) (actual time=51.403..55.517 rows=0 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  Buffers: shared hit=8215"
"  ->  Merge Join  (cost=16.37..24311.03 rows=22487 width=104) (actual time=6.736..6.738 rows=0 loops=3)"
"        Merge Cond: (t2.id = t1.id)"
"        Buffers: shared hit=8215"
"        ->  Parallel Index Only Scan using t2_pkey on t2  (cost=0.42..20147.09 rows=416667 width=4) (actual time=0.024..0.024 rows=1 loops=3)"
"              Heap Fetches: 0"
"              Buffers: shared hit=8"
"        ->  Index Scan using t1_pkey on t1  (cost=0.44..50848.71 rows=1133330 width=104) (actual time=6.710..6.710 rows=0 loops=3)"
"              Index Cond: (id < 1000000)"
"              Buffers: shared hit=8207"
"Planning:"
"  Buffers: shared hit=2745"
"Planning Time: 3.938 ms"
"Execution Time: 55.550 ms"

当我们使用手动ANALYZE T1;时，这意味着更新T1表统计信息，然后再次查询该查询会得到比Merge JOIN更好的Nested Loop

"QUERY PLAN"
"Nested Loop  (cost=0.86..8.90 rows=1 width=104) (actual time=0.004..0.004 rows=0 loops=1)"
"  Buffers: shared hit=3"
"  ->  Index Scan using t1_pkey on t1  (cost=0.44..4.46 rows=1 width=104) (actual time=0.003..0.003 rows=0 loops=1)"
"        Index Cond: (id < 1000000)"
"        Buffers: shared hit=3"
"  ->  Index Only Scan using t2_pkey on t2  (cost=0.42..4.44 rows=1 width=4) (never executed)"
"        Index Cond: (id = t1.id)"
"        Heap Fetches: 0"
"Planning:"
"  Buffers: shared hit=20"
"Planning Time: 0.232 ms"
"Execution Time: 0.027 ms"

小结论：

表中精确的统计数据将帮助优化器通过精确的计算得到正确的执行计划表中的成本。

这是一个帮助我们搜索 last_analyze & 的脚本。 last_vacuum 最后一次。

SELECT
  schemaname, relname,
  last_vacuum, last_autovacuum,
  vacuum_count, autovacuum_count,
  last_analyze,last_autoanalyze
FROM pg_stat_user_tables
where relname = 'tablename';

Postgresql optimizer is a cost-based optimizer (CBO), queries will be executed by the smallest cost from execution plans, and the cost will calculate by the statistic of the table.

Why are bad row estimates slow in Postgres?

Because the wrong statistic might choose a bad execution plan. Here is an example

There are two tables, T1 has 20000000 rows, T2 has 1000000 rows.

CREATE TABLE T1 (
    ID INT NOT NULL PRIMARY KEY,
    val INT NOT NULL,
    col1 UUID NOT NULL,
    col2 UUID NOT NULL,
    col3 UUID NOT NULL,
    col4 UUID NOT NULL,
    col5 UUID NOT NULL,
    col6 UUID NOT NULL
);


INSERT INTO T1
SELECT i,
       RANDOM() * 1000000,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid
FROM generate_series(1,20000000) i;


CREATE TABLE T2 (
    ID INT NOT NULL PRIMARY KEY,
    val INT NOT NULL,
    col1 UUID NOT NULL,
    col2 UUID NOT NULL,
    col3 UUID NOT NULL,
    col4 UUID NOT NULL,
    col5 UUID NOT NULL,
    col6 UUID NOT NULL
);

INSERT INTO T2
SELECT i,
       RANDOM() * 1000000,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid
FROM generate_series(1,1000000) i;

when we do join on tables we will get an execution plan which might use Merge JOIN

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT t1.*
FROM T1 
INNER JOIN T2 ON t1.id = t2.id 
WHERE t1.id < 1000000

"Gather  (cost=1016.37..30569.85 rows=53968 width=104) (actual time=0.278..837.297 rows=999999 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  Buffers: shared hit=38273 read=21841"
"  ->  Merge Join  (cost=16.37..24173.05 rows=22487 width=104) (actual time=11.993..662.770 rows=333333 loops=3)"
"        Merge Cond: (t2.id = t1.id)"
"        Buffers: shared hit=38273 read=21841"
"        ->  Parallel Index Only Scan using t2_pkey on t2  (cost=0.42..20147.09 rows=416667 width=4) (actual time=0.041..69.947 rows=333333 loops=3)"
"              Heap Fetches: 0"
"              Buffers: shared hit=6 read=2732"
"        ->  Index Scan using t1_pkey on t1  (cost=0.44..48427.24 rows=1079360 width=104) (actual time=0.041..329.874 rows=999819 loops=3)"
"              Index Cond: (id < 1000000)"
"              Buffers: shared hit=38267 read=19109"
"Planning:"
"  Buffers: shared hit=4 read=8"
"Planning Time: 0.228 ms"
"Execution Time: 906.760 ms"

but when I update a lot of rows as below let id plus 100000000 when id smaller than 1000000

update T1
set id = id + 100000000
where id < 1000000

we use the same query again, it will use Merge JOIN, but There should be another better option instead of Merge JOIN.

if you didn't hit the autovacuum_analyze_threshold (autovacuum_analyze_threshold default value was 0.1 that mean we need to create more than 10% deadtuple postgresql will update statistic automatically)

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT t1.*
FROM T1 
INNER JOIN T2 ON t1.id = t2.id 
WHERE t1.id < 1000000

"Gather  (cost=1016.37..30707.83 rows=53968 width=104) (actual time=51.403..55.517 rows=0 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  Buffers: shared hit=8215"
"  ->  Merge Join  (cost=16.37..24311.03 rows=22487 width=104) (actual time=6.736..6.738 rows=0 loops=3)"
"        Merge Cond: (t2.id = t1.id)"
"        Buffers: shared hit=8215"
"        ->  Parallel Index Only Scan using t2_pkey on t2  (cost=0.42..20147.09 rows=416667 width=4) (actual time=0.024..0.024 rows=1 loops=3)"
"              Heap Fetches: 0"
"              Buffers: shared hit=8"
"        ->  Index Scan using t1_pkey on t1  (cost=0.44..50848.71 rows=1133330 width=104) (actual time=6.710..6.710 rows=0 loops=3)"
"              Index Cond: (id < 1000000)"
"              Buffers: shared hit=8207"
"Planning:"
"  Buffers: shared hit=2745"
"Planning Time: 3.938 ms"
"Execution Time: 55.550 ms"

when we use manual ANALYZE T1; which mean update T1 table statistic, then query again the query will get Nested Loop which is better than Merge JOIN

"QUERY PLAN"
"Nested Loop  (cost=0.86..8.90 rows=1 width=104) (actual time=0.004..0.004 rows=0 loops=1)"
"  Buffers: shared hit=3"
"  ->  Index Scan using t1_pkey on t1  (cost=0.44..4.46 rows=1 width=104) (actual time=0.003..0.003 rows=0 loops=1)"
"        Index Cond: (id < 1000000)"
"        Buffers: shared hit=3"
"  ->  Index Only Scan using t2_pkey on t2  (cost=0.42..4.44 rows=1 width=4) (never executed)"
"        Index Cond: (id = t1.id)"
"        Heap Fetches: 0"
"Planning:"
"  Buffers: shared hit=20"
"Planning Time: 0.232 ms"
"Execution Time: 0.027 ms"

small conclusion:

A precise statistic in table will help the optimizer get the right execution plan by precise COST from tables.

Here is a script that helps us search last_analyze & last_vacuum the last time.

SELECT
  schemaname, relname,
  last_vacuum, last_autovacuum,
  vacuum_count, autovacuum_count,
  last_analyze,last_autoanalyze
FROM pg_stat_user_tables
where relname = 'tablename';

回复收藏 0 原文