postgresql/postgis sum()汇总查询缓慢的性能
我正在尝试使用Postgis来计算一个国家内部多边形的总面积;对于跨越县边界的多边形,计算并使用相交区域。 查询使用2个表,一个包含国家边界,另一个包含非重叠的多边形。
如果我运行以下查询来计算每个多边形/零件多边形与该国相交的每个多边形多边形的面积,则返回120万的国家会导致10秒钟以上。
select a.area from (
select ct."Code", CASE
WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
END as area
from cpdata_gb as cp
inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;
解释:
Nested Loop (cost=0.41..88198.66 rows=1655 width=8)
" -> Seq Scan on ""Country"" ct (cost=0.00..1.06 rows=1 width=32)"
" Filter: ((""Code"")::text = 'GB-ENG'::text)"
-> Index Scan using cpdata_gb_geom_idx on cpdata_gb cp (cost=0.41..4825.32 rows=166 width=329)
" Index Cond: (geom && ct.""Boundary"")"
" Filter: st_intersects(geom, ct.""Boundary"")"
但是,当我添加汇总sum()
函数时,查询需要6.5小时!
select sum(a.area) from (
select ct."Code", CASE
WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
END as area
from cpdata_gb as cp
inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;
解释(分析,缓冲区):
Finalize GroupAggregate (cost=1000.00..52619030.70 rows=1 width=15) (actual time=23188484.154..23188490.791 rows=1 loops=1)
" Group Key: ct.""Code"""
Buffers: shared hit=2575358 read=70144
-> Gather (cost=1000.00..52619030.68 rows=2 width=15) (actual time=23187705.263..23188490.608 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2575358 read=70144
-> Partial GroupAggregate (cost=0.00..52618030.48 rows=1 width=15) (actual time=23188004.960..23188004.962 rows=1 loops=3)
" Group Key: ct.""Code"""
Buffers: shared hit=2575358 read=70144
-> Nested Loop (cost=0.00..18017348.75 rows=686794 width=7976568) (actual time=299.721..25284.973 rows=403475 loops=3)
" Join Filter: st_intersects(cp.geom, ct.""Boundary"")"
Rows Removed by Join Filter: 147302
Buffers: shared hit=1662545 read=70144
-> Parallel Seq Scan on cpdata_gb cp (cost=0.00..84605.03 rows=687803 width=326) (actual time=2.933..2189.890 rows=550777 loops=3)
Buffers: shared hit=7583 read=70144
" -> Seq Scan on ""Country"" ct (cost=0.00..1.06 rows=1 width=7976242) (actual time=0.007..0.007 rows=1 loops=1652330)"
" Filter: ((""Code"")::text = 'GB-ENG'::text)"
Rows Removed by Filter: 1
Buffers: shared hit=1652330
Planning:
Buffers: shared hit=250
Planning Time: 7.362 ms
JIT:
Functions: 36
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 2.525 ms, Inlining 149.719 ms, Optimization 294.058 ms, Emission 219.977 ms, Total 666.279 ms"
Execution Time: 23188588.520 ms
为什么这种汇总功能表现如此糟糕的想法?
I'm trying to use PostGIS to calculate the total area of polygons that fall within a country; for polygons that span the county boundary the intersecting area is calculated and used.
The query uses 2 tables, one containing the country boundary and the other contains the non-overlapping polygons.
If I run the query below to calculate the area for each polygon/part polygon that intersects the country it returns the 1.2 million results in just over 10s.
select a.area from (
select ct."Code", CASE
WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
END as area
from cpdata_gb as cp
inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;
explain:
Nested Loop (cost=0.41..88198.66 rows=1655 width=8)
" -> Seq Scan on ""Country"" ct (cost=0.00..1.06 rows=1 width=32)"
" Filter: ((""Code"")::text = 'GB-ENG'::text)"
-> Index Scan using cpdata_gb_geom_idx on cpdata_gb cp (cost=0.41..4825.32 rows=166 width=329)
" Index Cond: (geom && ct.""Boundary"")"
" Filter: st_intersects(geom, ct.""Boundary"")"
However when I add the aggregate sum()
function to get the total area the query takes 6.5 hours!
select sum(a.area) from (
select ct."Code", CASE
WHEN ST_CoveredBy(cp.geom, ct."Boundary") THEN ST_Area(cp.geom)
ELSE ST_Area(ST_Multi(ST_Intersection(cp.geom, ct."Boundary")))
END as area
from cpdata_gb as cp
inner join "Country" as ct on ST_Intersects(cp.geom, ct."Boundary"
)
where ct."Code" = 'GB-ENG') as a;
explain (analyze, buffers):
Finalize GroupAggregate (cost=1000.00..52619030.70 rows=1 width=15) (actual time=23188484.154..23188490.791 rows=1 loops=1)
" Group Key: ct.""Code"""
Buffers: shared hit=2575358 read=70144
-> Gather (cost=1000.00..52619030.68 rows=2 width=15) (actual time=23187705.263..23188490.608 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2575358 read=70144
-> Partial GroupAggregate (cost=0.00..52618030.48 rows=1 width=15) (actual time=23188004.960..23188004.962 rows=1 loops=3)
" Group Key: ct.""Code"""
Buffers: shared hit=2575358 read=70144
-> Nested Loop (cost=0.00..18017348.75 rows=686794 width=7976568) (actual time=299.721..25284.973 rows=403475 loops=3)
" Join Filter: st_intersects(cp.geom, ct.""Boundary"")"
Rows Removed by Join Filter: 147302
Buffers: shared hit=1662545 read=70144
-> Parallel Seq Scan on cpdata_gb cp (cost=0.00..84605.03 rows=687803 width=326) (actual time=2.933..2189.890 rows=550777 loops=3)
Buffers: shared hit=7583 read=70144
" -> Seq Scan on ""Country"" ct (cost=0.00..1.06 rows=1 width=7976242) (actual time=0.007..0.007 rows=1 loops=1652330)"
" Filter: ((""Code"")::text = 'GB-ENG'::text)"
Rows Removed by Filter: 1
Buffers: shared hit=1652330
Planning:
Buffers: shared hit=250
Planning Time: 7.362 ms
JIT:
Functions: 36
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 2.525 ms, Inlining 149.719 ms, Optimization 294.058 ms, Emission 219.977 ms, Total 666.279 ms"
Execution Time: 23188588.520 ms
Any ideas why this aggregate function is performing so badly?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论