Postgres / Postgis查询优化

发布于 2025-01-23 23:45:53 字数 2265 浏览 4 评论 0原文

我整理了一个有效的查询,我只是想了解如何优化它。查询的想法是,在表A中给出了一个特定的行,它采用其几何形状,在表B中找到了与某些标准过滤的最接近的几何形状。

SELECT     a.id,
           closest_pt.dist,
           closest_pt.name,
           closest_pt.meters
FROM       "hex-hex-uk" a
CROSS JOIN lateral
       (
                SELECT   a.id,
                         b.name                            AS name,
                         a.geom <-> b.way                  AS dist,
                         st_distance(a.geom, b.way, FALSE) AS meters
                FROM     "osm-polygons-uk" b
                WHERE    (
                                  b.landuse='industrial'
                         OR       b.man_made='works')
                AND      st_area(b.way, FALSE)>15000
                ORDER BY a.geom <-> b.way
                LIMIT    1) AS closest_pt
WHERE      a.id='abc'

目前,查询在30-90ms中执行,但我需要执行数百万此查找。我尝试摇摆 a.id ='abc'带有a.id in('abc','def','ghi',...)一次查找10000 ,但是需要10分钟+,这并没有真正加起来。

这是查询计划:

"  ->  Index Scan using ""hex-hex-uk_id_idx"" on ""hex-hex-uk"" a  (cost=0.43..8.45 rows=1 width=168) (actual time=0.029..0.046 rows=1 loops=1)"
"        Index Cond: ((id)::text = '89195c849a3ffff'::text)"
"  ->  Limit  (cost=0.28..536.88 rows=1 width=43) (actual time=33.009..33.062 rows=1 loops=1)"
"        ->  Index Scan using ""idx_osm-polygons-uk_geom"" on ""osm-polygons-uk"" b  (cost=0.28..4935623.77 rows=9198 width=43) (actual time=32.992..33.001 rows=1 loops=1)"
"              Order By: (way <-> a.geom)"
"              Filter: (((landuse = 'industrial'::text) OR (man_made = 'works'::text)) AND (st_area((way)::geography, false) > '15000'::double precision))"
"              Rows Removed by Filter: 7"
"Planning Time: 0.142 ms"
"Execution Time: 33.311 ms"

尝试优化这样的查询的过程是什么?以身作则,我学到了最好的例子,因此我认为在此处发帖是有意义的,而不仅仅是阅读有关优化技术。

谢谢!

CREATE TABLE "osm-polygons-uk" (id bigint,name text,landuse text, man_made text,way geometry);
CREATE INDEX "idx_osm-polygons-uk_geom" ON "osm-polygons-uk" USING gist (way);
ALTER TABLE "osm-polygons-uk" ADD PRIMARY KEY (id);

CREATE TABLE "hex-hex-uk" (id varchar(15), geom geometry);
CREATE UNIQUE INDEX ON "hex-hex-uk" (id);

I've put together a query which works, I'm just wanting to learn how I can optimise it. The idea of the query is that given a particular row in table A, it take its geometry and in table B finds the closest matching geometry to it filtered by certain criteria.

SELECT     a.id,
           closest_pt.dist,
           closest_pt.name,
           closest_pt.meters
FROM       "hex-hex-uk" a
CROSS JOIN lateral
       (
                SELECT   a.id,
                         b.name                            AS name,
                         a.geom <-> b.way                  AS dist,
                         st_distance(a.geom, b.way, FALSE) AS meters
                FROM     "osm-polygons-uk" b
                WHERE    (
                                  b.landuse='industrial'
                         OR       b.man_made='works')
                AND      st_area(b.way, FALSE)>15000
                ORDER BY a.geom <-> b.way
                LIMIT    1) AS closest_pt
WHERE      a.id='abc'

Currently the query executes in 30-90ms, but I need to perform millions of these lookups. I tried swopping
a.id='abc' with a.id IN ('abc','def','ghi',...) and looking up 10000 at a time, but it takes 10mins+ which doesn't really add up.

Here's the query plan as it stands:

"  ->  Index Scan using ""hex-hex-uk_id_idx"" on ""hex-hex-uk"" a  (cost=0.43..8.45 rows=1 width=168) (actual time=0.029..0.046 rows=1 loops=1)"
"        Index Cond: ((id)::text = '89195c849a3ffff'::text)"
"  ->  Limit  (cost=0.28..536.88 rows=1 width=43) (actual time=33.009..33.062 rows=1 loops=1)"
"        ->  Index Scan using ""idx_osm-polygons-uk_geom"" on ""osm-polygons-uk"" b  (cost=0.28..4935623.77 rows=9198 width=43) (actual time=32.992..33.001 rows=1 loops=1)"
"              Order By: (way <-> a.geom)"
"              Filter: (((landuse = 'industrial'::text) OR (man_made = 'works'::text)) AND (st_area((way)::geography, false) > '15000'::double precision))"
"              Rows Removed by Filter: 7"
"Planning Time: 0.142 ms"
"Execution Time: 33.311 ms"

What would be the process for trying to optimise a query like this? I learn best by example hence I think it makes sense to post on here rather than just reading about optimisation techniques.

Thanks!

CREATE TABLE "osm-polygons-uk" (id bigint,name text,landuse text, man_made text,way geometry);
CREATE INDEX "idx_osm-polygons-uk_geom" ON "osm-polygons-uk" USING gist (way);
ALTER TABLE "osm-polygons-uk" ADD PRIMARY KEY (id);

CREATE TABLE "hex-hex-uk" (id varchar(15), geom geometry);
CREATE UNIQUE INDEX ON "hex-hex-uk" (id);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

秋意浓 2025-01-30 23:45:53

上面的一些很棒的技巧。关于索引实体视图的评论使我仅使用过滤数据创建视图。它将行数从100万下降到〜20000,并在几秒钟内执行。

从那时起,我对原始查询进行了调整,并在几分钟内爆炸了2400000行。与最初的13个小时相比,它将需要进行巨大改进!

SELECT a.id, closest_pt.name, ST_Distance(a.geom, closest_pt.way, false) as meters
            FROM "hex-hex-uk" a
            CROSS JOIN LATERAL
              (SELECT
                 id,
                 b.name as name,
                 a.geom <-> b.way as dist,
                 b.way as way
                 FROM "tmp_industrial" b
                 ORDER BY dist ASC
               LIMIT 1) AS closest_pt WHERE a.id IN ('abc','def','ghi',...);

感谢您的提示,它为我提供了一些指南,以了解如何调试查询性能。

Some great tips above. The comment about the indexed materialized view led me to create a view with only the filtered data.. it cut the number of rows down from 1 million to ~20000 and executed in a couple of seconds.

From then I tweaked the original query and it ended up blasting through 2400000 rows in a couple of minutes. A huge improvement from the original 13 hours it was going to take to run!

SELECT a.id, closest_pt.name, ST_Distance(a.geom, closest_pt.way, false) as meters
            FROM "hex-hex-uk" a
            CROSS JOIN LATERAL
              (SELECT
                 id,
                 b.name as name,
                 a.geom <-> b.way as dist,
                 b.way as way
                 FROM "tmp_industrial" b
                 ORDER BY dist ASC
               LIMIT 1) AS closest_pt WHERE a.id IN ('abc','def','ghi',...);

Thanks for the tips, it gives me a bit of a guide as to how to go about debugging query performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文