如何改善GostGreSQL相交速度?

发布于 2025-01-20 07:57:11 字数 2409 浏览 0 评论 0原文

我在 postgres 中创建了下表:

CREATE TABLE mytable (id serial, c1 int, c2 int)

以下查询需要 0.9 毫秒才能完成:

select id from mytable where c1=555 and c2=444;

以下查询需要 6.5 毫秒才能完成:

select id from mytable where c1=555 intersect select id from mytable where c2=444;

第一个查询的查询计划是:

 Bitmap Heap Scan on mytable  (cost=46.90..54.84 rows=2 width=4) (actual time=0.623..0.626 rows=2 loops=1)
   Recheck Cond: ((c2 = 444) AND (c1 = 555))
   Heap Blocks: exact=2
   ->  BitmapAnd  (cost=46.90..46.90 rows=2 width=0) (actual time=0.609..0.610 rows=0 loops=1)
         ->  Bitmap Index Scan on i2  (cost=0.00..23.31 rows=1985 width=0) (actual time=0.267..0.267 rows=1978 loops=1)
               Index Cond: (c2 = 444)
         ->  Bitmap Index Scan on i1  (cost=0.00..23.33 rows=1987 width=0) (actual time=0.258..0.258 rows=1988 loops=1)
               Index Cond: (c1 = 555)

第二个计划是:

HashSetOp Intersect  (cost=23.81..10244.16 rows=1985 width=8) (actual time=6.784..6.804 rows=2 loops=1)
   ->  Append  (cost=23.81..10234.23 rows=3972 width=8) (actual time=0.543..6.093 rows=3966 loops=1)
         ->  Subquery Scan on "*SELECT* 2"  (cost=23.81..5106.08 rows=1985 width=8) (actual time=0.542..2.928 rows=1978 loops=1)
               ->  Bitmap Heap Scan on mytable  (cost=23.81..5086.23 rows=1985 width=4) (actual time=0.540..2.636 rows=1978 loops=1)
                     Recheck Cond: (c2 = 444)
                     Heap Blocks: exact=1810
                     ->  Bitmap Index Scan on i2  (cost=0.00..23.31 rows=1985 width=0) (actual time=0.331..0.331 rows=1978 loops=1)
                           Index Cond: (c2 = 444)
         ->  Subquery Scan on "*SELECT* 1"  (cost=23.83..5108.29 rows=1987 width=8) (actual time=0.537..2.790 rows=1988 loops=1)
               ->  Bitmap Heap Scan on mytable mytable_1  (cost=23.83..5088.42 rows=1987 width=4) (actual time=0.536..2.495 rows=1988 loops=1)
                     Recheck Cond: (c1 = 555)
                     Heap Blocks: exact=1812
                     ->  Bitmap Index Scan on i1  (cost=0.00..23.33 rows=1987 width=0) (actual time=0.340..0.340 rows=1988 loops=1)
                           Index Cond: (c1 = 555)

速度差异似乎是由于在第一次查询执行中使用了 BitmapAnd 。有没有办法让 postgres 对第二个查询也执行类似的执行?

I have created the following table in postgres:

CREATE TABLE mytable (id serial, c1 int, c2 int)

The following query takes 0.9 ms to complete:

select id from mytable where c1=555 and c2=444;

The following query takes 6.5 ms to complete:

select id from mytable where c1=555 intersect select id from mytable where c2=444;

The query plan for the first query is:

 Bitmap Heap Scan on mytable  (cost=46.90..54.84 rows=2 width=4) (actual time=0.623..0.626 rows=2 loops=1)
   Recheck Cond: ((c2 = 444) AND (c1 = 555))
   Heap Blocks: exact=2
   ->  BitmapAnd  (cost=46.90..46.90 rows=2 width=0) (actual time=0.609..0.610 rows=0 loops=1)
         ->  Bitmap Index Scan on i2  (cost=0.00..23.31 rows=1985 width=0) (actual time=0.267..0.267 rows=1978 loops=1)
               Index Cond: (c2 = 444)
         ->  Bitmap Index Scan on i1  (cost=0.00..23.33 rows=1987 width=0) (actual time=0.258..0.258 rows=1988 loops=1)
               Index Cond: (c1 = 555)

and for the second plan is:

HashSetOp Intersect  (cost=23.81..10244.16 rows=1985 width=8) (actual time=6.784..6.804 rows=2 loops=1)
   ->  Append  (cost=23.81..10234.23 rows=3972 width=8) (actual time=0.543..6.093 rows=3966 loops=1)
         ->  Subquery Scan on "*SELECT* 2"  (cost=23.81..5106.08 rows=1985 width=8) (actual time=0.542..2.928 rows=1978 loops=1)
               ->  Bitmap Heap Scan on mytable  (cost=23.81..5086.23 rows=1985 width=4) (actual time=0.540..2.636 rows=1978 loops=1)
                     Recheck Cond: (c2 = 444)
                     Heap Blocks: exact=1810
                     ->  Bitmap Index Scan on i2  (cost=0.00..23.31 rows=1985 width=0) (actual time=0.331..0.331 rows=1978 loops=1)
                           Index Cond: (c2 = 444)
         ->  Subquery Scan on "*SELECT* 1"  (cost=23.83..5108.29 rows=1987 width=8) (actual time=0.537..2.790 rows=1988 loops=1)
               ->  Bitmap Heap Scan on mytable mytable_1  (cost=23.83..5088.42 rows=1987 width=4) (actual time=0.536..2.495 rows=1988 loops=1)
                     Recheck Cond: (c1 = 555)
                     Heap Blocks: exact=1812
                     ->  Bitmap Index Scan on i1  (cost=0.00..23.33 rows=1987 width=0) (actual time=0.340..0.340 rows=1988 loops=1)
                           Index Cond: (c1 = 555)

The difference in speed seems to be due to BitmapAnd being used in the first query execution. Is there anyway to make postgres do a similar execution for the second query as well ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

要走干脆点 2025-01-27 07:57:11

进入共享缓冲区中已取消固定的页面非常慢。当然不是与从磁盘读取该页面相比,而是与您所做的任何其他事情相比。需要多个原子操作来确保并发进程在执行此操作时不会相互破坏,并且几乎肯定还会涉及 CPU 缓存未命中。您的 HashSetOp 需要比 BitmapAnd 执行更多操作(大约 1800 倍,根据您的查询计划编号,但这只是计算表页)。 BitmapAnd 与从本地内存中的索引读取的 ctids 相交,然后仅命中那些在相交中幸存的表,而 HashSetOp 需要先命中表才能进行相交,因为它是与 SQL 字段相交,而不是 ctids。

您可以通过使用仅索引扫描来改进错误查询。如果查询根本不需要访问表,因为它可以在索引中找到 SQL 字段,那么它需要访问的页面要少得多(索引页面的每个页面中塞满了许多相关的元组)

如果我在(c1, id)(c2, id),所以我让每个子查询都使用仅索引扫描,生成的 HashSetOp 计划比原来的快 3 到 4 倍原始HashSetOp没有这些索引的计划,同时仍然
比 BitmapAnd 慢 2 到 3 倍。这是在一个刚刚 VACUUMed 的表上,因此所有页面都标记为全部可见。

Stepping onto an unpinned page which is already in shared buffers is quite slow. Not compared to reading that page from disk of course, but compared to anything else you do. Several atomic operations are required to make sure concurrent processes don't corrupt each other while they do this, and it will almost surely also involve CPU cache misses. Your HashSetOp needs to do this a lot more than the BitmapAnd does (around 1800 times more, based on your query plan numbers--but that is only counting table pages). The BitmapAnd intersects ctids read from the index in its local memory and then only hits the table for those surviving the intersect, while your HashSetOp needs to hit the table before it can do the intersect because it is intersecting SQL fields, not ctids.

You can improve the bad query by getting it to use index-only scans. If the query doesn't need to hit the table at all because it can find the SQL field in the index, then it needs to visit far fewer pages (the index pages have many relevant tuples crammed into each page)

If I build indexes on (c1, id) and (c2, id), so I got each subquery to use index-only scans, the resulting HashSetOp plan is 3 to 4 times faster than the original HashSetOp plan without those indexes, while still being
2 to 3 times slower than the BitmapAnd. This is on a freshly VACUUMed table, so that all pages are marked all-visible.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文