postgresql/postgis sum（）汇总查询缓慢的性能

发布于 2025-02-05 16:11:09 字数 3283 浏览 2 评论 0原文

我正在尝试使用Postgis来计算一个国家内部多边形的总面积；对于跨越县边界的多边形，计算并使用相交区域。查询使用2个表，一个包含国家边界，另一个包含非重叠的多边形。

如果我运行以下查询来计算每个多边形/零件多边形与该国相交的每个多边形多边形的面积，则返回120万的国家会导致10秒钟以上。

select a.area from (
    select ct."Code", CASE
          WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
          ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
          END as area
    from cpdata_gb as cp
    inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;

解释：

Nested Loop  (cost=0.41..88198.66 rows=1655 width=8)
"  ->  Seq Scan on ""Country"" ct  (cost=0.00..1.06 rows=1 width=32)"
"        Filter: ((""Code"")::text = 'GB-ENG'::text)"
  ->  Index Scan using cpdata_gb_geom_idx on cpdata_gb cp  (cost=0.41..4825.32 rows=166 width=329)
"        Index Cond: (geom && ct.""Boundary"")"
"        Filter: st_intersects(geom, ct.""Boundary"")"

但是，当我添加汇总sum（）函数时，查询需要6.5小时！

select sum(a.area) from (
    select ct."Code", CASE
          WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
          ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
          END as area
    from cpdata_gb as cp
    inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;

解释（分析，缓冲区）：

Finalize GroupAggregate  (cost=1000.00..52619030.70 rows=1 width=15) (actual time=23188484.154..23188490.791 rows=1 loops=1)
"  Group Key: ct.""Code"""
  Buffers: shared hit=2575358 read=70144
  ->  Gather  (cost=1000.00..52619030.68 rows=2 width=15) (actual time=23187705.263..23188490.608 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=2575358 read=70144
        ->  Partial GroupAggregate  (cost=0.00..52618030.48 rows=1 width=15) (actual time=23188004.960..23188004.962 rows=1 loops=3)
"              Group Key: ct.""Code"""
              Buffers: shared hit=2575358 read=70144
              ->  Nested Loop  (cost=0.00..18017348.75 rows=686794 width=7976568) (actual time=299.721..25284.973 rows=403475 loops=3)
"                    Join Filter: st_intersects(cp.geom, ct.""Boundary"")"
                    Rows Removed by Join Filter: 147302
                    Buffers: shared hit=1662545 read=70144
                    ->  Parallel Seq Scan on cpdata_gb cp  (cost=0.00..84605.03 rows=687803 width=326) (actual time=2.933..2189.890 rows=550777 loops=3)
                          Buffers: shared hit=7583 read=70144
"                    ->  Seq Scan on ""Country"" ct  (cost=0.00..1.06 rows=1 width=7976242) (actual time=0.007..0.007 rows=1 loops=1652330)"
"                          Filter: ((""Code"")::text = 'GB-ENG'::text)"
                          Rows Removed by Filter: 1
                          Buffers: shared hit=1652330
Planning:
  Buffers: shared hit=250
Planning Time: 7.362 ms
JIT:
  Functions: 36
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 2.525 ms, Inlining 149.719 ms, Optimization 294.058 ms, Emission 219.977 ms, Total 666.279 ms"
Execution Time: 23188588.520 ms

为什么这种汇总功能表现如此糟糕的想法？

原文

I'm trying to use PostGIS to calculate the total area of polygons that fall within a country; for polygons that span the county boundary the intersecting area is calculated and used.
The query uses 2 tables, one containing the country boundary and the other contains the non-overlapping polygons.

If I run the query below to calculate the area for each polygon/part polygon that intersects the country it returns the 1.2 million results in just over 10s.

select a.area from (
    select ct."Code", CASE
          WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
          ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
          END as area
    from cpdata_gb as cp
    inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;

explain:

Nested Loop  (cost=0.41..88198.66 rows=1655 width=8)
"  ->  Seq Scan on ""Country"" ct  (cost=0.00..1.06 rows=1 width=32)"
"        Filter: ((""Code"")::text = 'GB-ENG'::text)"
  ->  Index Scan using cpdata_gb_geom_idx on cpdata_gb cp  (cost=0.41..4825.32 rows=166 width=329)
"        Index Cond: (geom && ct.""Boundary"")"
"        Filter: st_intersects(geom, ct.""Boundary"")"

However when I add the aggregate sum() function to get the total area the query takes 6.5 hours!

select sum(a.area) from (
    select ct."Code", CASE
          WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
          ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
          END as area
    from cpdata_gb as cp
    inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;

explain (analyze, buffers):

Finalize GroupAggregate  (cost=1000.00..52619030.70 rows=1 width=15) (actual time=23188484.154..23188490.791 rows=1 loops=1)
"  Group Key: ct.""Code"""
  Buffers: shared hit=2575358 read=70144
  ->  Gather  (cost=1000.00..52619030.68 rows=2 width=15) (actual time=23187705.263..23188490.608 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=2575358 read=70144
        ->  Partial GroupAggregate  (cost=0.00..52618030.48 rows=1 width=15) (actual time=23188004.960..23188004.962 rows=1 loops=3)
"              Group Key: ct.""Code"""
              Buffers: shared hit=2575358 read=70144
              ->  Nested Loop  (cost=0.00..18017348.75 rows=686794 width=7976568) (actual time=299.721..25284.973 rows=403475 loops=3)
"                    Join Filter: st_intersects(cp.geom, ct.""Boundary"")"
                    Rows Removed by Join Filter: 147302
                    Buffers: shared hit=1662545 read=70144
                    ->  Parallel Seq Scan on cpdata_gb cp  (cost=0.00..84605.03 rows=687803 width=326) (actual time=2.933..2189.890 rows=550777 loops=3)
                          Buffers: shared hit=7583 read=70144
"                    ->  Seq Scan on ""Country"" ct  (cost=0.00..1.06 rows=1 width=7976242) (actual time=0.007..0.007 rows=1 loops=1652330)"
"                          Filter: ((""Code"")::text = 'GB-ENG'::text)"
                          Rows Removed by Filter: 1
                          Buffers: shared hit=1652330
Planning:
  Buffers: shared hit=250
Planning Time: 7.362 ms
JIT:
  Functions: 36
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 2.525 ms, Inlining 149.719 ms, Optimization 294.058 ms, Emission 219.977 ms, Total 666.279 ms"
Execution Time: 23188588.520 ms

Any ideas why this aggregate function is performing so badly?

分享到QQ

分享到微博