如何正确创建postgres日期索引?

发布于 2024-09-15 19:14:43 字数 3151 浏览 3 评论 0原文

我正在使用 Django ORM 和 postgresql。

ORM 创建一个查询:

SELECT 
    (date_part('month', stat_date)) AS "stat_date", 
    "direct_keywordstat"."banner_id", 
    SUM("direct_keywordstat"."total") AS "total", 
    SUM("direct_keywordstat"."clicks") AS "clicks", 
    SUM("direct_keywordstat"."shows") AS "shows" 

FROM "direct_keywordstat" 

LEFT OUTER JOIN "direct_banner" ON ("direct_keywordstat"."banner_id" = "direct_banner"."banner_ptr_id") 
LEFT OUTER JOIN "platforms_banner" ON ("direct_banner"."banner_ptr_id" = "platforms_banner"."id") 

WHERE (
     "direct_keywordstat".stat_date BETWEEN E'2009-08-25' AND E'2010-08-25' AND
    "direct_keywordstat"."keyword_id" IN (

        SELECT U0."id" 
        FROM "direct_keyword" U0 
        INNER JOIN "direct_banner" U1 ON (U0."banner_id" = U1."banner_ptr_id") 
        INNER JOIN "platforms_banner" U2 ON (U1."banner_ptr_id" = U2."id") 
        INNER JOIN "platforms_campaign" U3 ON (U2."campaign_id" = U3."id")
        INNER JOIN "direct_campaign" U4 ON (U3."id" = U4."campaign_ptr_id")
        WHERE (
            U0."deleted" = E'False' AND 
            U0."low_ctr" = E'False' AND 
            U4."status_active" = E'True' AND 
            U0."banner_id" IN (

                SELECT U0."banner_ptr_id" 
                FROM "direct_banner" U0 
                INNER JOIN "platforms_banner" U1 
                ON (U0."banner_ptr_id" = U1."id") 
                WHERE (
                    U0."status_show" = E'True' AND 
                    U1."campaign_id" = E'174' )
            )
        )
    )
) 

GROUP BY 
    "direct_keywordstat"."banner_id", 
    (date_part('month', stat_date)), 
    "platforms_banner"."title", date_trunc('month', stat_date) 

ORDER BY "platforms_banner"."title" ASC, "stat_date" ASC

问题是,direct_keywordstat 包含 300 万条以上记录,因此查询在大约 15 秒内执行。

我尝试过创建类似的索引

CREATE INDEX direct_keywordstat_stat_date on direct_keywordstat using btree(stat_date);

,但 EXPLAIN ANALYZE 显示未使用索引。

表架构:

\d direct_keywordstat

                                   Table "public.direct_keywordstat"
   Column    |          Type          |                            Modifiers                            
-------------+------------------------+-----------------------------------------------------------------
 id          | integer                | not null default nextval('direct_keywordstat_id_seq'::regclass)
 keyword_id  | integer                | not null
 banner_id   | integer                | not null
 campaign_id | integer                | not null
 stat_date   | date                   | not null
 region_id   | integer                | not null
 place_type  | character varying(30)  | 
 place_name  | character varying(100) | 
 clicks      | integer                | not null default 0
 shows       | integer                | not null default 0
 total       | numeric(19,6)          | not null

如何创建有用的索引?

或者,也许有机会以其他方式优化这个查询?

问题是,如果 WHERE 看起来

"direct_keywordstat".clicks BETWEEN 10 AND 3000000

查询在 0.8 秒内执行。

I'm using Django ORM and postgresql.

ORM creates a query:

SELECT 
    (date_part('month', stat_date)) AS "stat_date", 
    "direct_keywordstat"."banner_id", 
    SUM("direct_keywordstat"."total") AS "total", 
    SUM("direct_keywordstat"."clicks") AS "clicks", 
    SUM("direct_keywordstat"."shows") AS "shows" 

FROM "direct_keywordstat" 

LEFT OUTER JOIN "direct_banner" ON ("direct_keywordstat"."banner_id" = "direct_banner"."banner_ptr_id") 
LEFT OUTER JOIN "platforms_banner" ON ("direct_banner"."banner_ptr_id" = "platforms_banner"."id") 

WHERE (
     "direct_keywordstat".stat_date BETWEEN E'2009-08-25' AND E'2010-08-25' AND
    "direct_keywordstat"."keyword_id" IN (

        SELECT U0."id" 
        FROM "direct_keyword" U0 
        INNER JOIN "direct_banner" U1 ON (U0."banner_id" = U1."banner_ptr_id") 
        INNER JOIN "platforms_banner" U2 ON (U1."banner_ptr_id" = U2."id") 
        INNER JOIN "platforms_campaign" U3 ON (U2."campaign_id" = U3."id")
        INNER JOIN "direct_campaign" U4 ON (U3."id" = U4."campaign_ptr_id")
        WHERE (
            U0."deleted" = E'False' AND 
            U0."low_ctr" = E'False' AND 
            U4."status_active" = E'True' AND 
            U0."banner_id" IN (

                SELECT U0."banner_ptr_id" 
                FROM "direct_banner" U0 
                INNER JOIN "platforms_banner" U1 
                ON (U0."banner_ptr_id" = U1."id") 
                WHERE (
                    U0."status_show" = E'True' AND 
                    U1."campaign_id" = E'174' )
            )
        )
    )
) 

GROUP BY 
    "direct_keywordstat"."banner_id", 
    (date_part('month', stat_date)), 
    "platforms_banner"."title", date_trunc('month', stat_date) 

ORDER BY "platforms_banner"."title" ASC, "stat_date" ASC

Problem is, direct_keywordstat contains 3mln+ records, so the query executes in ~15 seconds.

I've tried creating indexes like

CREATE INDEX direct_keywordstat_stat_date on direct_keywordstat using btree(stat_date);

But EXPLAIN ANALYZE show that index is not used.

Table schema:

\d direct_keywordstat

                                   Table "public.direct_keywordstat"
   Column    |          Type          |                            Modifiers                            
-------------+------------------------+-----------------------------------------------------------------
 id          | integer                | not null default nextval('direct_keywordstat_id_seq'::regclass)
 keyword_id  | integer                | not null
 banner_id   | integer                | not null
 campaign_id | integer                | not null
 stat_date   | date                   | not null
 region_id   | integer                | not null
 place_type  | character varying(30)  | 
 place_name  | character varying(100) | 
 clicks      | integer                | not null default 0
 shows       | integer                | not null default 0
 total       | numeric(19,6)          | not null

How can i create useful index?

Or, maybe, there's a chance to optimize this query other way?

Thing is, if WHERE looks like

"direct_keywordstat".clicks BETWEEN 10 AND 3000000

query executes in 0.8 seconds.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

冷默言语 2024-09-22 19:14:43

您在这些列上有索引吗:

direct_banner.banner_ptr_id
direct_keywordstat.banner_id
direct_keywordstat.stat_date

direct_keywordstat 中的两列都可以合并在一个索引中,只需检查

这也是一个问题:

排序方式:外部合并磁盘:
20600kB

检查您的 work_mem 设置,此查询至少需要 20MB。

Do you have indexes on these columns:

direct_banner.banner_ptr_id
direct_keywordstat.banner_id
direct_keywordstat.stat_date

Both columns in direct_keywordstat could be combined in a single index, just check

This is also a problem:

Sort Method: external merge Disk:
20600kB

Check your settings for work_mem, you need at least 20MB for this query.

耳钉梦 2024-09-22 19:14:43
                                                                                                                       QUERY PLAN                                                                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=727967.61..847401.71 rows=2514402 width=67) (actual time=22010.522..23408.262 rows=5 loops=1)
   ->  Sort  (cost=727967.61..734253.62 rows=2514402 width=67) (actual time=21742.365..23134.748 rows=198978 loops=1)
         Sort Key: platforms_banner.title, (date_part('month'::text, (direct_keywordstat.stat_date)::timestamp without time zone)), direct_keywordstat.banner_id, (date_trunc('month'::text, (direct_keywordstat.stat_date)::timestamp with time zone))
         Sort Method:  external merge  Disk: 20600kB
         ->  Hash Join  (cost=1034.02..164165.25 rows=2514402 width=67) (actual time=5159.538..14942.441 rows=198978 loops=1)
               Hash Cond: (direct_keywordstat.keyword_id = u0.id)
               ->  Hash Left Join  (cost=365.78..117471.99 rows=2514402 width=71) (actual time=26.672..13101.294 rows=2523151 loops=1)
                     Hash Cond: (direct_keywordstat.banner_id = direct_banner.banner_ptr_id)
                     ->  Seq Scan on direct_keywordstat  (cost=0.00..76247.17 rows=2514402 width=25) (actual time=8.892..9386.010 rows=2523151 loops=1)
                           Filter: ((stat_date >= '2009-08-25'::date) AND (stat_date <= '2010-08-25'::date))
                     ->  Hash  (cost=324.86..324.86 rows=3274 width=50) (actual time=17.754..17.754 rows=2851 loops=1)
                           ->  Hash Left Join  (cost=209.15..324.86 rows=3274 width=50) (actual time=10.845..15.385 rows=2851 loops=1)
                                 Hash Cond: (direct_banner.banner_ptr_id = platforms_banner.id)
                                 ->  Seq Scan on direct_banner  (cost=0.00..66.74 rows=3274 width=4) (actual time=0.004..1.196 rows=2851 loops=1)
                                 ->  Hash  (cost=173.51..173.51 rows=2851 width=50) (actual time=10.683..10.683 rows=2851 loops=1)
                                       ->  Seq Scan on platforms_banner  (cost=0.00..173.51 rows=2851 width=50) (actual time=0.004..3.576 rows=2851 loops=1)
               ->  Hash  (cost=641.44..641.44 rows=2144 width=4) (actual time=30.420..30.420 rows=106 loops=1)
                     ->  HashAggregate  (cost=620.00..641.44 rows=2144 width=4) (actual time=30.162..30.288 rows=106 loops=1)
                           ->  Hash Join  (cost=407.17..614.64 rows=2144 width=4) (actual time=16.152..30.031 rows=106 loops=1)
                                 Hash Cond: (u0.banner_id = u1.banner_ptr_id)
                                 ->  Nested Loop  (cost=76.80..238.50 rows=6488 width=16) (actual time=8.670..22.343 rows=106 loops=1)
                                       ->  HashAggregate  (cost=76.80..76.87 rows=7 width=8) (actual time=0.045..0.047 rows=1 loops=1)
                                             ->  Nested Loop  (cost=0.00..76.79 rows=7 width=8) (actual time=0.033..0.036 rows=1 loops=1)
                                                   ->  Index Scan using platforms_banner_campaign_id on platforms_banner u1  (cost=0.00..22.82 rows=7 width=4) (actual time=0.019..0.020 rows=1 loops=1)
                                                         Index Cond: (campaign_id = 174)
                                                   ->  Index Scan using direct_banner_pkey on direct_banner u0  (cost=0.00..7.70 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1)
                                                         Index Cond: (u0.banner_ptr_id = u1.id)
                                                         Filter: u0.status_show
                                       ->  Index Scan using direct_keyword_banner_id on direct_keyword u0  (cost=0.00..23.03 rows=5 width=8) (actual time=8.620..22.127 rows=106 loops=1)
                                             Index Cond: (u0.banner_id = u0.banner_ptr_id)
                                             Filter: ((NOT u0.deleted) AND (NOT u0.low_ctr))
                                 ->  Hash  (cost=316.84..316.84 rows=1082 width=8) (actual time=7.458..7.458 rows=403 loops=1)
                                       ->  Hash Join  (cost=227.00..316.84 rows=1082 width=8) (actual time=3.584..7.149 rows=403 loops=1)
                                             Hash Cond: (u1.banner_ptr_id = u2.id)
                                             ->  Seq Scan on direct_banner u1  (cost=0.00..66.74 rows=3274 width=4) (actual time=0.002..1.570 rows=2851 loops=1)
                                             ->  Hash  (cost=213.48..213.48 rows=1082 width=4) (actual time=3.521..3.521 rows=403 loops=1)
                                                   ->  Hash Join  (cost=23.88..213.48 rows=1082 width=4) (actual time=0.715..3.268 rows=403 loops=1)
                                                         Hash Cond: (u2.campaign_id = u3.id)
                                                         ->  Seq Scan on platforms_banner u2  (cost=0.00..173.51 rows=2851 width=8) (actual time=0.001..1.272 rows=2851 loops=1)
                                                         ->  Hash  (cost=22.95..22.95 rows=74 width=8) (actual time=0.345..0.345 rows=37 loops=1)
                                                               ->  Hash Join  (cost=11.84..22.95 rows=74 width=8) (actual time=0.133..0.320 rows=37 loops=1)
                                                                 Hash Cond: (u3.id = u4.campaign_ptr_id)
                                                                 ->  Seq Scan on platforms_campaign u3  (cost=0.00..8.91 rows=391 width=4) (actual time=0.006..0.098 rows=196 loops=1)
                                                                 ->  Hash  (cost=10.91..10.91 rows=74 width=4) (actual time=0.117..0.117 rows=37 loops=1)
                                                                       ->  Seq Scan on direct_campaign u4  (cost=0.00..10.91 rows=74 width=4) (actual time=0.004..0.097 rows=37 loops=1)
                                                                             Filter: status_active

总运行时间:23436.715 毫秒
(47 行)

在这里

                                                                                                                       QUERY PLAN                                                                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=727967.61..847401.71 rows=2514402 width=67) (actual time=22010.522..23408.262 rows=5 loops=1)
   ->  Sort  (cost=727967.61..734253.62 rows=2514402 width=67) (actual time=21742.365..23134.748 rows=198978 loops=1)
         Sort Key: platforms_banner.title, (date_part('month'::text, (direct_keywordstat.stat_date)::timestamp without time zone)), direct_keywordstat.banner_id, (date_trunc('month'::text, (direct_keywordstat.stat_date)::timestamp with time zone))
         Sort Method:  external merge  Disk: 20600kB
         ->  Hash Join  (cost=1034.02..164165.25 rows=2514402 width=67) (actual time=5159.538..14942.441 rows=198978 loops=1)
               Hash Cond: (direct_keywordstat.keyword_id = u0.id)
               ->  Hash Left Join  (cost=365.78..117471.99 rows=2514402 width=71) (actual time=26.672..13101.294 rows=2523151 loops=1)
                     Hash Cond: (direct_keywordstat.banner_id = direct_banner.banner_ptr_id)
                     ->  Seq Scan on direct_keywordstat  (cost=0.00..76247.17 rows=2514402 width=25) (actual time=8.892..9386.010 rows=2523151 loops=1)
                           Filter: ((stat_date >= '2009-08-25'::date) AND (stat_date <= '2010-08-25'::date))
                     ->  Hash  (cost=324.86..324.86 rows=3274 width=50) (actual time=17.754..17.754 rows=2851 loops=1)
                           ->  Hash Left Join  (cost=209.15..324.86 rows=3274 width=50) (actual time=10.845..15.385 rows=2851 loops=1)
                                 Hash Cond: (direct_banner.banner_ptr_id = platforms_banner.id)
                                 ->  Seq Scan on direct_banner  (cost=0.00..66.74 rows=3274 width=4) (actual time=0.004..1.196 rows=2851 loops=1)
                                 ->  Hash  (cost=173.51..173.51 rows=2851 width=50) (actual time=10.683..10.683 rows=2851 loops=1)
                                       ->  Seq Scan on platforms_banner  (cost=0.00..173.51 rows=2851 width=50) (actual time=0.004..3.576 rows=2851 loops=1)
               ->  Hash  (cost=641.44..641.44 rows=2144 width=4) (actual time=30.420..30.420 rows=106 loops=1)
                     ->  HashAggregate  (cost=620.00..641.44 rows=2144 width=4) (actual time=30.162..30.288 rows=106 loops=1)
                           ->  Hash Join  (cost=407.17..614.64 rows=2144 width=4) (actual time=16.152..30.031 rows=106 loops=1)
                                 Hash Cond: (u0.banner_id = u1.banner_ptr_id)
                                 ->  Nested Loop  (cost=76.80..238.50 rows=6488 width=16) (actual time=8.670..22.343 rows=106 loops=1)
                                       ->  HashAggregate  (cost=76.80..76.87 rows=7 width=8) (actual time=0.045..0.047 rows=1 loops=1)
                                             ->  Nested Loop  (cost=0.00..76.79 rows=7 width=8) (actual time=0.033..0.036 rows=1 loops=1)
                                                   ->  Index Scan using platforms_banner_campaign_id on platforms_banner u1  (cost=0.00..22.82 rows=7 width=4) (actual time=0.019..0.020 rows=1 loops=1)
                                                         Index Cond: (campaign_id = 174)
                                                   ->  Index Scan using direct_banner_pkey on direct_banner u0  (cost=0.00..7.70 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1)
                                                         Index Cond: (u0.banner_ptr_id = u1.id)
                                                         Filter: u0.status_show
                                       ->  Index Scan using direct_keyword_banner_id on direct_keyword u0  (cost=0.00..23.03 rows=5 width=8) (actual time=8.620..22.127 rows=106 loops=1)
                                             Index Cond: (u0.banner_id = u0.banner_ptr_id)
                                             Filter: ((NOT u0.deleted) AND (NOT u0.low_ctr))
                                 ->  Hash  (cost=316.84..316.84 rows=1082 width=8) (actual time=7.458..7.458 rows=403 loops=1)
                                       ->  Hash Join  (cost=227.00..316.84 rows=1082 width=8) (actual time=3.584..7.149 rows=403 loops=1)
                                             Hash Cond: (u1.banner_ptr_id = u2.id)
                                             ->  Seq Scan on direct_banner u1  (cost=0.00..66.74 rows=3274 width=4) (actual time=0.002..1.570 rows=2851 loops=1)
                                             ->  Hash  (cost=213.48..213.48 rows=1082 width=4) (actual time=3.521..3.521 rows=403 loops=1)
                                                   ->  Hash Join  (cost=23.88..213.48 rows=1082 width=4) (actual time=0.715..3.268 rows=403 loops=1)
                                                         Hash Cond: (u2.campaign_id = u3.id)
                                                         ->  Seq Scan on platforms_banner u2  (cost=0.00..173.51 rows=2851 width=8) (actual time=0.001..1.272 rows=2851 loops=1)
                                                         ->  Hash  (cost=22.95..22.95 rows=74 width=8) (actual time=0.345..0.345 rows=37 loops=1)
                                                               ->  Hash Join  (cost=11.84..22.95 rows=74 width=8) (actual time=0.133..0.320 rows=37 loops=1)
                                                                 Hash Cond: (u3.id = u4.campaign_ptr_id)
                                                                 ->  Seq Scan on platforms_campaign u3  (cost=0.00..8.91 rows=391 width=4) (actual time=0.006..0.098 rows=196 loops=1)
                                                                 ->  Hash  (cost=10.91..10.91 rows=74 width=4) (actual time=0.117..0.117 rows=37 loops=1)
                                                                       ->  Seq Scan on direct_campaign u4  (cost=0.00..10.91 rows=74 width=4) (actual time=0.004..0.097 rows=37 loops=1)
                                                                             Filter: status_active

Total runtime: 23436.715 ms
(47 rows)

Here it is

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文