postgresql/postgis sum()汇总查询缓慢的性能

发布于 2025-02-05 16:11:09 字数 3283 浏览 2 评论 0原文

我正在尝试使用Postgis来计算一个国家内部多边形的总面积;对于跨越县边界的多边形,计算并使用相交区域。 查询使用2个表,一个包含国家边界,另一个包含非重叠的多边形。

如果我运行以下查询来计算每个多边形/零件多边形与该国相交的每个多边形多边形的面积,则返回120万的国家会导致10秒钟以上。

select a.area from (
    select ct."Code", CASE
          WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
          ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
          END as area
    from cpdata_gb as cp
    inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;

解释:

Nested Loop  (cost=0.41..88198.66 rows=1655 width=8)
"  ->  Seq Scan on ""Country"" ct  (cost=0.00..1.06 rows=1 width=32)"
"        Filter: ((""Code"")::text = 'GB-ENG'::text)"
  ->  Index Scan using cpdata_gb_geom_idx on cpdata_gb cp  (cost=0.41..4825.32 rows=166 width=329)
"        Index Cond: (geom && ct.""Boundary"")"
"        Filter: st_intersects(geom, ct.""Boundary"")"

但是,当我添加汇总sum()函数时,查询需要6.5小时!

select sum(a.area) from (
    select ct."Code", CASE
          WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
          ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
          END as area
    from cpdata_gb as cp
    inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;

解释(分析,缓冲区):

Finalize GroupAggregate  (cost=1000.00..52619030.70 rows=1 width=15) (actual time=23188484.154..23188490.791 rows=1 loops=1)
"  Group Key: ct.""Code"""
  Buffers: shared hit=2575358 read=70144
  ->  Gather  (cost=1000.00..52619030.68 rows=2 width=15) (actual time=23187705.263..23188490.608 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=2575358 read=70144
        ->  Partial GroupAggregate  (cost=0.00..52618030.48 rows=1 width=15) (actual time=23188004.960..23188004.962 rows=1 loops=3)
"              Group Key: ct.""Code"""
              Buffers: shared hit=2575358 read=70144
              ->  Nested Loop  (cost=0.00..18017348.75 rows=686794 width=7976568) (actual time=299.721..25284.973 rows=403475 loops=3)
"                    Join Filter: st_intersects(cp.geom, ct.""Boundary"")"
                    Rows Removed by Join Filter: 147302
                    Buffers: shared hit=1662545 read=70144
                    ->  Parallel Seq Scan on cpdata_gb cp  (cost=0.00..84605.03 rows=687803 width=326) (actual time=2.933..2189.890 rows=550777 loops=3)
                          Buffers: shared hit=7583 read=70144
"                    ->  Seq Scan on ""Country"" ct  (cost=0.00..1.06 rows=1 width=7976242) (actual time=0.007..0.007 rows=1 loops=1652330)"
"                          Filter: ((""Code"")::text = 'GB-ENG'::text)"
                          Rows Removed by Filter: 1
                          Buffers: shared hit=1652330
Planning:
  Buffers: shared hit=250
Planning Time: 7.362 ms
JIT:
  Functions: 36
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 2.525 ms, Inlining 149.719 ms, Optimization 294.058 ms, Emission 219.977 ms, Total 666.279 ms"
Execution Time: 23188588.520 ms

为什么这种汇总功能表现如此糟糕的想法?

I'm trying to use PostGIS to calculate the total area of polygons that fall within a country; for polygons that span the county boundary the intersecting area is calculated and used.
The query uses 2 tables, one containing the country boundary and the other contains the non-overlapping polygons.

If I run the query below to calculate the area for each polygon/part polygon that intersects the country it returns the 1.2 million results in just over 10s.

select a.area from (
    select ct."Code", CASE
          WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
          ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
          END as area
    from cpdata_gb as cp
    inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;

explain:

Nested Loop  (cost=0.41..88198.66 rows=1655 width=8)
"  ->  Seq Scan on ""Country"" ct  (cost=0.00..1.06 rows=1 width=32)"
"        Filter: ((""Code"")::text = 'GB-ENG'::text)"
  ->  Index Scan using cpdata_gb_geom_idx on cpdata_gb cp  (cost=0.41..4825.32 rows=166 width=329)
"        Index Cond: (geom && ct.""Boundary"")"
"        Filter: st_intersects(geom, ct.""Boundary"")"

However when I add the aggregate sum() function to get the total area the query takes 6.5 hours!

select sum(a.area) from (
    select ct."Code", CASE
          WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
          ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
          END as area
    from cpdata_gb as cp
    inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;

explain (analyze, buffers):

Finalize GroupAggregate  (cost=1000.00..52619030.70 rows=1 width=15) (actual time=23188484.154..23188490.791 rows=1 loops=1)
"  Group Key: ct.""Code"""
  Buffers: shared hit=2575358 read=70144
  ->  Gather  (cost=1000.00..52619030.68 rows=2 width=15) (actual time=23187705.263..23188490.608 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=2575358 read=70144
        ->  Partial GroupAggregate  (cost=0.00..52618030.48 rows=1 width=15) (actual time=23188004.960..23188004.962 rows=1 loops=3)
"              Group Key: ct.""Code"""
              Buffers: shared hit=2575358 read=70144
              ->  Nested Loop  (cost=0.00..18017348.75 rows=686794 width=7976568) (actual time=299.721..25284.973 rows=403475 loops=3)
"                    Join Filter: st_intersects(cp.geom, ct.""Boundary"")"
                    Rows Removed by Join Filter: 147302
                    Buffers: shared hit=1662545 read=70144
                    ->  Parallel Seq Scan on cpdata_gb cp  (cost=0.00..84605.03 rows=687803 width=326) (actual time=2.933..2189.890 rows=550777 loops=3)
                          Buffers: shared hit=7583 read=70144
"                    ->  Seq Scan on ""Country"" ct  (cost=0.00..1.06 rows=1 width=7976242) (actual time=0.007..0.007 rows=1 loops=1652330)"
"                          Filter: ((""Code"")::text = 'GB-ENG'::text)"
                          Rows Removed by Filter: 1
                          Buffers: shared hit=1652330
Planning:
  Buffers: shared hit=250
Planning Time: 7.362 ms
JIT:
  Functions: 36
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 2.525 ms, Inlining 149.719 ms, Optimization 294.058 ms, Emission 219.977 ms, Total 666.279 ms"
Execution Time: 23188588.520 ms

Any ideas why this aggregate function is performing so badly?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文