Postgres / Postgis查询优化

发布于 2025-01-23 23:45:53 字数 2265 浏览 4 评论 0原文

我整理了一个有效的查询，我只是想了解如何优化它。查询的想法是，在表A中给出了一个特定的行，它采用其几何形状，在表B中找到了与某些标准过滤的最接近的几何形状。

SELECT     a.id,
           closest_pt.dist,
           closest_pt.name,
           closest_pt.meters
FROM       "hex-hex-uk" a
CROSS JOIN lateral
       (
                SELECT   a.id,
                         b.name                            AS name,
                         a.geom <-> b.way                  AS dist,
                         st_distance(a.geom, b.way, FALSE) AS meters
                FROM     "osm-polygons-uk" b
                WHERE    (
                                  b.landuse='industrial'
                         OR       b.man_made='works')
                AND      st_area(b.way, FALSE)>15000
                ORDER BY a.geom <-> b.way
                LIMIT    1) AS closest_pt
WHERE      a.id='abc'

目前，查询在30-90ms中执行，但我需要执行数百万此查找。我尝试摇摆 a.id ='abc'带有a.id in（'abc'，'def'，'ghi'，...）一次查找10000 ，但是需要10分钟+，这并没有真正加起来。

这是查询计划：

"  ->  Index Scan using ""hex-hex-uk_id_idx"" on ""hex-hex-uk"" a  (cost=0.43..8.45 rows=1 width=168) (actual time=0.029..0.046 rows=1 loops=1)"
"        Index Cond: ((id)::text = '89195c849a3ffff'::text)"
"  ->  Limit  (cost=0.28..536.88 rows=1 width=43) (actual time=33.009..33.062 rows=1 loops=1)"
"        ->  Index Scan using ""idx_osm-polygons-uk_geom"" on ""osm-polygons-uk"" b  (cost=0.28..4935623.77 rows=9198 width=43) (actual time=32.992..33.001 rows=1 loops=1)"
"              Order By: (way <-> a.geom)"
"              Filter: (((landuse = 'industrial'::text) OR (man_made = 'works'::text)) AND (st_area((way)::geography, false) > '15000'::double precision))"
"              Rows Removed by Filter: 7"
"Planning Time: 0.142 ms"
"Execution Time: 33.311 ms"

尝试优化这样的查询的过程是什么？以身作则，我学到了最好的例子，因此我认为在此处发帖是有意义的，而不仅仅是阅读有关优化技术。

谢谢！

CREATE TABLE "osm-polygons-uk" (id bigint,name text,landuse text, man_made text,way geometry);
CREATE INDEX "idx_osm-polygons-uk_geom" ON "osm-polygons-uk" USING gist (way);
ALTER TABLE "osm-polygons-uk" ADD PRIMARY KEY (id);

CREATE TABLE "hex-hex-uk" (id varchar(15), geom geometry);
CREATE UNIQUE INDEX ON "hex-hex-uk" (id);

原文

I've put together a query which works, I'm just wanting to learn how I can optimise it. The idea of the query is that given a particular row in table A, it take its geometry and in table B finds the closest matching geometry to it filtered by certain criteria.

SELECT     a.id,
           closest_pt.dist,
           closest_pt.name,
           closest_pt.meters
FROM       "hex-hex-uk" a
CROSS JOIN lateral
       (
                SELECT   a.id,
                         b.name                            AS name,
                         a.geom <-> b.way                  AS dist,
                         st_distance(a.geom, b.way, FALSE) AS meters
                FROM     "osm-polygons-uk" b
                WHERE    (
                                  b.landuse='industrial'
                         OR       b.man_made='works')
                AND      st_area(b.way, FALSE)>15000
                ORDER BY a.geom <-> b.way
                LIMIT    1) AS closest_pt
WHERE      a.id='abc'

Currently the query executes in 30-90ms, but I need to perform millions of these lookups. I tried swopping
a.id='abc' with a.id IN ('abc','def','ghi',...) and looking up 10000 at a time, but it takes 10mins+ which doesn't really add up.

Here's the query plan as it stands:

"  ->  Index Scan using ""hex-hex-uk_id_idx"" on ""hex-hex-uk"" a  (cost=0.43..8.45 rows=1 width=168) (actual time=0.029..0.046 rows=1 loops=1)"
"        Index Cond: ((id)::text = '89195c849a3ffff'::text)"
"  ->  Limit  (cost=0.28..536.88 rows=1 width=43) (actual time=33.009..33.062 rows=1 loops=1)"
"        ->  Index Scan using ""idx_osm-polygons-uk_geom"" on ""osm-polygons-uk"" b  (cost=0.28..4935623.77 rows=9198 width=43) (actual time=32.992..33.001 rows=1 loops=1)"
"              Order By: (way <-> a.geom)"
"              Filter: (((landuse = 'industrial'::text) OR (man_made = 'works'::text)) AND (st_area((way)::geography, false) > '15000'::double precision))"
"              Rows Removed by Filter: 7"
"Planning Time: 0.142 ms"
"Execution Time: 33.311 ms"

What would be the process for trying to optimise a query like this? I learn best by example hence I think it makes sense to post on here rather than just reading about optimisation techniques.

Thanks!

CREATE TABLE "osm-polygons-uk" (id bigint,name text,landuse text, man_made text,way geometry);
CREATE INDEX "idx_osm-polygons-uk_geom" ON "osm-polygons-uk" USING gist (way);
ALTER TABLE "osm-polygons-uk" ADD PRIMARY KEY (id);

CREATE TABLE "hex-hex-uk" (id varchar(15), geom geometry);
CREATE UNIQUE INDEX ON "hex-hex-uk" (id);

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

秋意浓 2025-01-30 23:45:53

上面的一些很棒的技巧。关于索引实体视图的评论使我仅使用过滤数据创建视图。它将行数从100万下降到〜20000，并在几秒钟内执行。

从那时起，我对原始查询进行了调整，并在几分钟内爆炸了2400000行。与最初的13个小时相比，它将需要进行巨大改进！

SELECT a.id, closest_pt.name, ST_Distance(a.geom, closest_pt.way, false) as meters
            FROM "hex-hex-uk" a
            CROSS JOIN LATERAL
              (SELECT
                 id,
                 b.name as name,
                 a.geom <-> b.way as dist,
                 b.way as way
                 FROM "tmp_industrial" b
                 ORDER BY dist ASC
               LIMIT 1) AS closest_pt WHERE a.id IN ('abc','def','ghi',...);

感谢您的提示，它为我提供了一些指南，以了解如何调试查询性能。

Some great tips above. The comment about the indexed materialized view led me to create a view with only the filtered data.. it cut the number of rows down from 1 million to ~20000 and executed in a couple of seconds.

From then I tweaked the original query and it ended up blasting through 2400000 rows in a couple of minutes. A huge improvement from the original 13 hours it was going to take to run!

SELECT a.id, closest_pt.name, ST_Distance(a.geom, closest_pt.way, false) as meters
            FROM "hex-hex-uk" a
            CROSS JOIN LATERAL
              (SELECT
                 id,
                 b.name as name,
                 a.geom <-> b.way as dist,
                 b.way as way
                 FROM "tmp_industrial" b
                 ORDER BY dist ASC
               LIMIT 1) AS closest_pt WHERE a.id IN ('abc','def','ghi',...);

Thanks for the tips, it gives me a bit of a guide as to how to go about debugging query performance.

回复收藏 0 原文

~没有更多了~