优化 PostgreSQL 中的查询

发布于 2024-11-02 04:10:52 字数 404 浏览 1 评论 0原文

SELECT count(*) 
FROM contacts_lists 
     JOIN plain_contacts 
          ON contacts_lists.contact_id = plain_contacts.contact_id 
     JOIN contacts 
          ON contacts.id = plain_contacts.contact_id 
WHERE plain_contacts.has_email 
      AND NOT contacts.email_bad 
      AND NOT contacts.email_unsub 
      AND contacts_lists.list_id =67339

我怎样才能优化这个查询..你能解释一下吗...

SELECT count(*) 
FROM contacts_lists 
     JOIN plain_contacts 
          ON contacts_lists.contact_id = plain_contacts.contact_id 
     JOIN contacts 
          ON contacts.id = plain_contacts.contact_id 
WHERE plain_contacts.has_email 
      AND NOT contacts.email_bad 
      AND NOT contacts.email_unsub 
      AND contacts_lists.list_id =67339

how can i optimize this query.. could you please explain...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

向日葵 2024-11-09 04:10:52

为了清晰起见,重新格式化您的查询计划:

QUERY PLAN Aggregate (cost=126377.96..126377.97 rows=1 width=0)
  -> Hash Join (cost=6014.51..126225.38 rows=61033 width=0)
     Hash Cond: (contacts_lists.contact_id = plain_contacts.contact_id)
    -> Hash Join (cost=3067.30..121828.63 rows=61033 width=8)
       Hash Cond: (contacts_lists.contact_id = contacts.id)
      -> Index Scan using index_contacts_lists_on_list_id_and_contact_id
         on contacts_lists (cost=0.00..116909.97 rows=61033 width=4)
         Index Cond: (list_id = 66996)
         -> Hash (cost=1721.41..1721.41 rows=84551 width=4)
         -> Seq Scan on contacts (cost=0.00..1721.41 rows=84551 width=4)
            Filter: ((NOT email_bad) AND (NOT email_unsub))
            -> Hash (cost=2474.97..2474.97 rows=37779 width=4)
            -> Seq Scan on plain_contacts (cost=0.00..2474.97 rows=37779 width=4)
               Filter: has_email

两个部分索引可能消除序列扫描,具体取决于您的数据分布:

-- if many contacts have bad emails or are unsubscribed:
CREATE INDEX contacts_valid_email_idx ON contacts (id)
WHERE (NOT email_bad AND NOT email_unsub);

-- if many contacts have no email:
CREATE INDEX plain_contacts_valid_email_idx ON plain_contacts (id)
WHERE (has_email);

您可能会丢失外键索引:

CREATE INDEX plain_contacts_contact_id_idx ON plain_contacts (contact_id);

最后但并非最不重要的一点是,如果您从未分析过数据,您需要运行:

VACUUM ANALYZE;

如果完成所有操作后仍然很慢,则除了合并 plain_contacts 和联系人表外,您无能为力:尽管有上述索引,但获取上述查询计划意味着大多数/全部您的订阅者订阅了该特定列表 - 在这种情况下,上述查询计划是您获得的最快的查询计划。

Reformatting your query plan for clarity:

QUERY PLAN Aggregate (cost=126377.96..126377.97 rows=1 width=0)
  -> Hash Join (cost=6014.51..126225.38 rows=61033 width=0)
     Hash Cond: (contacts_lists.contact_id = plain_contacts.contact_id)
    -> Hash Join (cost=3067.30..121828.63 rows=61033 width=8)
       Hash Cond: (contacts_lists.contact_id = contacts.id)
      -> Index Scan using index_contacts_lists_on_list_id_and_contact_id
         on contacts_lists (cost=0.00..116909.97 rows=61033 width=4)
         Index Cond: (list_id = 66996)
         -> Hash (cost=1721.41..1721.41 rows=84551 width=4)
         -> Seq Scan on contacts (cost=0.00..1721.41 rows=84551 width=4)
            Filter: ((NOT email_bad) AND (NOT email_unsub))
            -> Hash (cost=2474.97..2474.97 rows=37779 width=4)
            -> Seq Scan on plain_contacts (cost=0.00..2474.97 rows=37779 width=4)
               Filter: has_email

Two partial indexes might eliminate seq scans depending on your data distribution:

-- if many contacts have bad emails or are unsubscribed:
CREATE INDEX contacts_valid_email_idx ON contacts (id)
WHERE (NOT email_bad AND NOT email_unsub);

-- if many contacts have no email:
CREATE INDEX plain_contacts_valid_email_idx ON plain_contacts (id)
WHERE (has_email);

You might be missing an index on a foreign key:

CREATE INDEX plain_contacts_contact_id_idx ON plain_contacts (contact_id);

Last but not least if you've never analyzed your data, you need to run:

VACUUM ANALYZE;

If it's still slow once all that is done, there isn't much you can do short of merging your plain_contacts and your contacts tables: getting the above query plan in spite of the above indexes means most/all of your subscribers are subscribed to that particular list -- in which case the above query plan is the fastest you'll get.

层林尽染 2024-11-09 04:10:52

这已经是一个非常简单的查询,只要统计信息是最新的,数据库将以最有效的方式运行,

因此就查询本身而言,没有太多可做的。

在数据库管理方面,您可以添加索引 - 数据库中应该有针对所有连接条件以及 where 子句中最具选择性的部分的索引(list_id、contact_id 作为 plain_contacts 和 contacts_lists 中的 FK)。这是提高该查询性能(数量级)的最重要机会。正如 SpliFF 所指出的,您可能已经拥有这些索引,因此请检查一下。

另外,postgres 有很好的 explain 命令,你应该学习和使用。它将有助于优化查询。

This is already a very simple query that the database will run in the most efficient way providing that statistics are up to date

So in terms of the query itself there's not much to do.

In terms of database administration you can add indexes - there should be indexes in the database for all the join conditions and also for the most selective part of the where clause (list_id, contact_id as FK in plain_contacts and contacts_lists). This is the most significant opportunity to improve performance of this query (orders of magnitude). Still as SpliFF notes, you probably already have those indexes, so check.

Also, postgres has good explain command that you should learn and use. It will help with optimizing queries.

庆幸我还是我 2024-11-09 04:10:52

由于您只想包含在连接表中设置了一些标志的行,因此我会将这些语句移至连接子句中:

SELECT count(*) 
FROM contacts_lists 
     JOIN plain_contacts 
          ON contacts_lists.contact_id = plain_contacts.contact_id 
          AND NOT plain_contacts.has_email
     JOIN contacts 
          ON contacts.id = plain_contacts.contact_id 
          AND NOT contacts.email_unsub 
          AND NOT contacts.email_bad 
WHERE contacts_lists.list_id =67339

我不确定这是否会对性能产生很大影响,但值得一试。为了获得最佳性能,您可能还应该在连接表上建立索引,如下所示:

plain_contacts: contact_id, has_email
contacts: id, email_unsub, email_bad

Since you only want to inlude rows that has some flags set in the joined tables, I would move that statements into the join clause:

SELECT count(*) 
FROM contacts_lists 
     JOIN plain_contacts 
          ON contacts_lists.contact_id = plain_contacts.contact_id 
          AND NOT plain_contacts.has_email
     JOIN contacts 
          ON contacts.id = plain_contacts.contact_id 
          AND NOT contacts.email_unsub 
          AND NOT contacts.email_bad 
WHERE contacts_lists.list_id =67339

I'm not sure if this would make a great impact on performance, but worth a try. You should probably have indexes on the joined tables as well for optimal performance, like this:

plain_contacts: contact_id, has_email
contacts: id, email_unsub, email_bad
红衣飘飘貌似仙 2024-11-09 04:10:52

您最近对数据库运行过 ANALYZE 吗? EXPLAIN 计划中的行计数看起来有意义吗? (看起来您只运行了 EXPLAIN。EXPLAIN ANALYZE 给出了估计时间和实际时间。)

Have you run ANALYZE on the database recently? Do the row counts in the EXPLAIN plan look like they make sense? (Looks like you ran only EXPLAIN. EXPLAIN ANALYZE gives both estimated and actual timings.)

沉鱼一梦 2024-11-09 04:10:52

您可以使用 SELECT count(1) ... ,但除此之外我认为它看起来不错。如果您真的很困难,您总是可以使用视图缓存查询的某些部分,或者在 contact_id 和 list_id 上放置索引(我假设您已经在 id 上有一个索引)。

You can use SELECT count(1) ... but other than that I'd say it looks fine. You could always cache some parts of the query using views or put indexes on contact_id and list_id if you're really struggling (I assume you have one on id already).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文