查询在复制数据库后减慢5倍(在同一台计算机上!)

发布于 2025-02-13 03:17:06 字数 3440 浏览 0 评论 0 原文

在基于位置的应用程序中,有一个特定的查询必须快速运行:

SELECT count(*) FROM users
  WHERE earth_box(ll_to_earth(40.71427000, -74.00597000), 50000) @> ll_to_earth(latitude, longitude)

但是,当使用Postgres的工具复制数据库后:

pg_dump dummy_users > dummy_users.dump
createdb slow_db
psql slow_db < dummy_users.dump

查询需要2.5秒,而不是在slow_db上进行0.5秒!

策划者在slow_db中选择了不同的路线,例如 解释slow_db的分析:

"Aggregate  (cost=10825.18..10825.19 rows=1 width=8) (actual time=2164.396..2164.396 rows=1 loops=1)"
"  ->  Bitmap Heap Scan on users  (cost=205.45..10818.39 rows=2714 width=0) (actual time=26.188..2155.680 rows=122836 loops=1)"
"        Recheck Cond: ('(1281995.9045467733, -4697354.822067326, 4110397.4955141144),(1381995.648489849, -4597355.078124251, 4210397.23945719)'::cube @> (ll_to_earth(latitude, longitude))::cube)"
"        Rows Removed by Index Recheck: 364502"
"        Heap Blocks: exact=57514 lossy=33728"
"        ->  Bitmap Index Scan on distance_index  (cost=0.00..204.77 rows=2714 width=0) (actual time=20.068..20.068 rows=122836 loops=1)"
"              Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1281995.9045467733, -4697354.822067326, 4110397.4955141144),(1381995.648489849, -4597355.078124251, 4210397.23945719)'::cube)"
"Planning Time: 1.002 ms"
"Execution Time: 2164.807 ms"

解释对来源db的分析:

"Aggregate  (cost=8807.01..8807.02 rows=1 width=8) (actual time=239.524..239.525 rows=1 loops=1)"
"  ->  Index Scan using distance_index on users  (cost=0.41..8801.69 rows=2130 width=0) (actual time=0.156..233.760 rows=122836 loops=1)"
"        Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1281995.9045467733, -4697354.822067326, 4110397.4955141144),(1381995.648489849, -4597355.078124251, 4210397.23945719)'::cube)"
"Planning Time: 3.928 ms"
"Execution Time: 239.546 ms"

对于两张表,在运行该查询之前和之后,我尝试运行维护工具(分析\ vaccum等)的位置上有一个索引:

CREATE INDEX 
   distance_index ON users USING gist (ll_to_earth(latitude, longitude))

我尝试运行维护工具(分析\ vaccum等) ,有或没有索引,无济于事!

这两个DB都在完全相同的机器上运行(因此,Postgres Server,Postgres Dist,配置)。 这两个DBS上的数据都是相同的(一个表),并且没有更改。 Postgres版本= 12.8。

PSQL的 \ l 这些数据库的输出:

                              List of databases
    Name     |  Owner   | Encoding | Collate | Ctype |   Access privileges   
-------------+----------+----------+---------+-------+----------------------- 
 dummy_users | yoni     | UTF8     | en_IL   | en_IL | 
 slow_db     | yoni     | UTF8     | en_IL   | en_IL | 

发生了什么?

(感谢Laurenz Albe) - 之后 设置enable_bitmapscan = off; 在慢速数据库上设置enable_seqscan = off; ,再次查询以下是 dixply> dixply> dixply> divell(分析,buffers)>输出:

"Aggregate  (cost=11018.63..11018.64 rows=1 width=8) (actual time=213.544..213.545 rows=1 loops=1)"
"  Buffers: shared hit=11667 read=110537"
"  ->  Index Scan using distance_index on users  (cost=0.41..11011.86 rows=2711 width=0) (actual time=0.262..207.164 rows=122836 loops=1)"
"        Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1282077.0159892815, -4697331.573647572, 4110397.4955141144),(1382076.7599323571, -4597331.829704497, 4210397.23945719)'::cube)"
"        Buffers: shared hit=11667 read=110537"
"Planning Time: 0.940 ms"
"Execution Time: 213.591 ms"

In a location based app,there's a specific query which has to run fast:

SELECT count(*) FROM users
  WHERE earth_box(ll_to_earth(40.71427000, -74.00597000), 50000) @> ll_to_earth(latitude, longitude)

However, when after copying the database by using Postgres' tools:

pg_dump dummy_users > dummy_users.dump
createdb slow_db
psql slow_db < dummy_users.dump

the query takes 2.5 seconds instead of 0.5 seconds on slow_db!!

The planner chooses a different route in slow_db, eg
Explain analyze on slow_db:

"Aggregate  (cost=10825.18..10825.19 rows=1 width=8) (actual time=2164.396..2164.396 rows=1 loops=1)"
"  ->  Bitmap Heap Scan on users  (cost=205.45..10818.39 rows=2714 width=0) (actual time=26.188..2155.680 rows=122836 loops=1)"
"        Recheck Cond: ('(1281995.9045467733, -4697354.822067326, 4110397.4955141144),(1381995.648489849, -4597355.078124251, 4210397.23945719)'::cube @> (ll_to_earth(latitude, longitude))::cube)"
"        Rows Removed by Index Recheck: 364502"
"        Heap Blocks: exact=57514 lossy=33728"
"        ->  Bitmap Index Scan on distance_index  (cost=0.00..204.77 rows=2714 width=0) (actual time=20.068..20.068 rows=122836 loops=1)"
"              Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1281995.9045467733, -4697354.822067326, 4110397.4955141144),(1381995.648489849, -4597355.078124251, 4210397.23945719)'::cube)"
"Planning Time: 1.002 ms"
"Execution Time: 2164.807 ms"

explain analyze on the origin db:

"Aggregate  (cost=8807.01..8807.02 rows=1 width=8) (actual time=239.524..239.525 rows=1 loops=1)"
"  ->  Index Scan using distance_index on users  (cost=0.41..8801.69 rows=2130 width=0) (actual time=0.156..233.760 rows=122836 loops=1)"
"        Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1281995.9045467733, -4697354.822067326, 4110397.4955141144),(1381995.648489849, -4597355.078124251, 4210397.23945719)'::cube)"
"Planning Time: 3.928 ms"
"Execution Time: 239.546 ms"

For both tables there's an index on the location which was created in the exact same way:

CREATE INDEX 
   distance_index ON users USING gist (ll_to_earth(latitude, longitude))

I've tried to run maintenance tools (analyze\vaccum etc) before and after running that query, with or without the index, doesn't help!

Both DBS run on the exact same machine (so same postgres server,postgres dist,configuration).
Data on both DBS is the same (one single table), and isn't changing.
The Postgres version = 12.8.

psql's \l output for those databases:

                              List of databases
    Name     |  Owner   | Encoding | Collate | Ctype |   Access privileges   
-------------+----------+----------+---------+-------+----------------------- 
 dummy_users | yoni     | UTF8     | en_IL   | en_IL | 
 slow_db     | yoni     | UTF8     | en_IL   | en_IL | 

What is going on?

(Thanks to Laurenz Albe) - after
SET enable_bitmapscan = off; and SET enable_seqscan = off; on the slow database, ran the query again here is the EXPLAIN (ANALYZE, BUFFERS) output:

"Aggregate  (cost=11018.63..11018.64 rows=1 width=8) (actual time=213.544..213.545 rows=1 loops=1)"
"  Buffers: shared hit=11667 read=110537"
"  ->  Index Scan using distance_index on users  (cost=0.41..11011.86 rows=2711 width=0) (actual time=0.262..207.164 rows=122836 loops=1)"
"        Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1282077.0159892815, -4697331.573647572, 4110397.4955141144),(1382076.7599323571, -4597331.829704497, 4210397.23945719)'::cube)"
"        Buffers: shared hit=11667 read=110537"
"Planning Time: 0.940 ms"
"Execution Time: 213.591 ms"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

漫雪独思 2025-02-20 03:17:06

手动真空/分析

还原新数据库后还原后,尚无列统计信息。通常, autovacuum 最终会启动,但是由于“数据[...]不更改” autovacuum 不会触发。

出于相同的原因(数据不更改),我建议在还原单个表后一次运行一次:

VACUUM (ANALYZE, FREEZE) users;

对于从未更改的表格,您不妨运行 freeze
完整不需要,因为刚修复的表中没有死元

。主要问题:

  1. 不良列统计
  2. 数据库配置不良(更严重的问题)

请参见:

在慢速db中,Postgres期望行= 2714 ,期望行= 2130 在快速中。差异可能似乎并不大,但可能足以将Postgres转向另一个查询计划(事实证明是次要的)。

看到Postgres实际上找到行= 122836 估计要么不好。慢速DB中的一个实际上是少> 。但是,位图扫描的速度比索引扫描要慢,即使排位赛的排比预期还要多。 (!),因此您的数据库配置很可能已经离去了。主要问题通常是默认 Random_page_cost 4 ,而完全缓存的仅读取表的现实设置更接近 1 。也许1.1允许一些额外费用。还有其他几个设置可以鼓励索引扫描。例如 。从这里开始:

eastera : 估计。列统计也是:统计。因此不准确,但会受到随机变化的影响。您可能会增加统计目标以提高列统计的有效性。

廉价随机读取有利于索引扫描并劝阻位图索引扫描。
更合格的行有利于位图索引扫描。不太喜欢索引扫描。请参阅:

Manual VACUUM / ANALYZE after restore

After restoring a new database, there are no column statistics yet. Normally, autovacuum will kick in eventually, but since "data [...] isn't changing", autovacuum wouldn't be triggered.

For the same reason (data isn't changing), I suggest to run this once after restoring your single table:

VACUUM (ANALYZE, FREEZE) users;

You might as well run FREEZE for a table that's never changed.
(FULL isn't necessary, since there are no dead tuples in a freshly restored table.)

Explanation for the plan change

With everything else being equal, I suspect at least two major problems:

  1. Bad column statistics
  2. Bad database configuration (the more severe problem)

See:

In the slow DB, Postgres expects rows=2714, while it expects rows=2130 in the fast one. The difference may not seem huge, but may well be enough to tip Postgres over to the other query plan (that turns out to be inferior).

Seeing that Postgres actually finds rows=122836, either estimate is bad. The one in the slow DB is actually less bad. But the bitmap scan turns out to be slower than the index scan, even with many more qualifying rows than expected. (!) So your database configuration is most probably way off. The main problem typically is the default random_page_cost of 4, while a realistic setting for fully cached read-only table is much closer to 1. Maybe 1.1 to allow for some additional cost. There are a couple other settings that encourage index scans. Like effective_cache_size. Start here:

Estimates are just that: estimates. And column statistics are also just that: statistics. So not exact but subject to random variation. You might increase the statistics target to increase the validity of column statistics.

Cheap random reads favor index scans and discourage bitmap index scans.
More qualifying rows favor a bitmap index scan. Less favor an index scan. See:

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文