Postgres / Postgis查询优化
我整理了一个有效的查询,我只是想了解如何优化它。查询的想法是,在表A中给出了一个特定的行,它采用其几何形状,在表B中找到了与某些标准过滤的最接近的几何形状。
SELECT a.id,
closest_pt.dist,
closest_pt.name,
closest_pt.meters
FROM "hex-hex-uk" a
CROSS JOIN lateral
(
SELECT a.id,
b.name AS name,
a.geom <-> b.way AS dist,
st_distance(a.geom, b.way, FALSE) AS meters
FROM "osm-polygons-uk" b
WHERE (
b.landuse='industrial'
OR b.man_made='works')
AND st_area(b.way, FALSE)>15000
ORDER BY a.geom <-> b.way
LIMIT 1) AS closest_pt
WHERE a.id='abc'
目前,查询在30-90ms中执行,但我需要执行数百万此查找。我尝试摇摆 a.id ='abc'
带有a.id in('abc','def','ghi',...)
一次查找10000 ,但是需要10分钟+,这并没有真正加起来。
这是查询计划:
" -> Index Scan using ""hex-hex-uk_id_idx"" on ""hex-hex-uk"" a (cost=0.43..8.45 rows=1 width=168) (actual time=0.029..0.046 rows=1 loops=1)"
" Index Cond: ((id)::text = '89195c849a3ffff'::text)"
" -> Limit (cost=0.28..536.88 rows=1 width=43) (actual time=33.009..33.062 rows=1 loops=1)"
" -> Index Scan using ""idx_osm-polygons-uk_geom"" on ""osm-polygons-uk"" b (cost=0.28..4935623.77 rows=9198 width=43) (actual time=32.992..33.001 rows=1 loops=1)"
" Order By: (way <-> a.geom)"
" Filter: (((landuse = 'industrial'::text) OR (man_made = 'works'::text)) AND (st_area((way)::geography, false) > '15000'::double precision))"
" Rows Removed by Filter: 7"
"Planning Time: 0.142 ms"
"Execution Time: 33.311 ms"
尝试优化这样的查询的过程是什么?以身作则,我学到了最好的例子,因此我认为在此处发帖是有意义的,而不仅仅是阅读有关优化技术。
谢谢!
CREATE TABLE "osm-polygons-uk" (id bigint,name text,landuse text, man_made text,way geometry);
CREATE INDEX "idx_osm-polygons-uk_geom" ON "osm-polygons-uk" USING gist (way);
ALTER TABLE "osm-polygons-uk" ADD PRIMARY KEY (id);
CREATE TABLE "hex-hex-uk" (id varchar(15), geom geometry);
CREATE UNIQUE INDEX ON "hex-hex-uk" (id);
I've put together a query which works, I'm just wanting to learn how I can optimise it. The idea of the query is that given a particular row in table A, it take its geometry and in table B finds the closest matching geometry to it filtered by certain criteria.
SELECT a.id,
closest_pt.dist,
closest_pt.name,
closest_pt.meters
FROM "hex-hex-uk" a
CROSS JOIN lateral
(
SELECT a.id,
b.name AS name,
a.geom <-> b.way AS dist,
st_distance(a.geom, b.way, FALSE) AS meters
FROM "osm-polygons-uk" b
WHERE (
b.landuse='industrial'
OR b.man_made='works')
AND st_area(b.way, FALSE)>15000
ORDER BY a.geom <-> b.way
LIMIT 1) AS closest_pt
WHERE a.id='abc'
Currently the query executes in 30-90ms, but I need to perform millions of these lookups. I tried swoppinga.id='abc'
with a.id IN ('abc','def','ghi',...)
and looking up 10000 at a time, but it takes 10mins+ which doesn't really add up.
Here's the query plan as it stands:
" -> Index Scan using ""hex-hex-uk_id_idx"" on ""hex-hex-uk"" a (cost=0.43..8.45 rows=1 width=168) (actual time=0.029..0.046 rows=1 loops=1)"
" Index Cond: ((id)::text = '89195c849a3ffff'::text)"
" -> Limit (cost=0.28..536.88 rows=1 width=43) (actual time=33.009..33.062 rows=1 loops=1)"
" -> Index Scan using ""idx_osm-polygons-uk_geom"" on ""osm-polygons-uk"" b (cost=0.28..4935623.77 rows=9198 width=43) (actual time=32.992..33.001 rows=1 loops=1)"
" Order By: (way <-> a.geom)"
" Filter: (((landuse = 'industrial'::text) OR (man_made = 'works'::text)) AND (st_area((way)::geography, false) > '15000'::double precision))"
" Rows Removed by Filter: 7"
"Planning Time: 0.142 ms"
"Execution Time: 33.311 ms"
What would be the process for trying to optimise a query like this? I learn best by example hence I think it makes sense to post on here rather than just reading about optimisation techniques.
Thanks!
CREATE TABLE "osm-polygons-uk" (id bigint,name text,landuse text, man_made text,way geometry);
CREATE INDEX "idx_osm-polygons-uk_geom" ON "osm-polygons-uk" USING gist (way);
ALTER TABLE "osm-polygons-uk" ADD PRIMARY KEY (id);
CREATE TABLE "hex-hex-uk" (id varchar(15), geom geometry);
CREATE UNIQUE INDEX ON "hex-hex-uk" (id);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
上面的一些很棒的技巧。关于索引实体视图的评论使我仅使用过滤数据创建视图。它将行数从100万下降到〜20000,并在几秒钟内执行。
从那时起,我对原始查询进行了调整,并在几分钟内爆炸了2400000行。与最初的13个小时相比,它将需要进行巨大改进!
感谢您的提示,它为我提供了一些指南,以了解如何调试查询性能。
Some great tips above. The comment about the indexed materialized view led me to create a view with only the filtered data.. it cut the number of rows down from 1 million to ~20000 and executed in a couple of seconds.
From then I tweaked the original query and it ended up blasting through 2400000 rows in a couple of minutes. A huge improvement from the original 13 hours it was going to take to run!
Thanks for the tips, it gives me a bit of a guide as to how to go about debugging query performance.