在 Postgres 中加速缓慢的 SELECT DISTINCT 查询的解决方案

发布于 2024-11-19 06:10:46 字数 436 浏览 4 评论 0原文

该查询基本上是:

SELECT DISTINCT "my_table"."foo" from "my_table" WHERE...

假装我 100% 确定查询的 DISTINCT 部分是其运行缓慢的原因,我省略了查询的其余部分以避免混淆,因为它是我主要关心的是不同部分的缓慢(不同总是缓慢的根源)。

该表有 250 万行数据。此处未列出的目的需要 DISTINCT (因为我不想返回修改后的查询,而只想返回有关使不同查询在 上运行得更快的一般信息>DBMS 级别,如果可能)。

如何在不更改 SQL 的情况下使 DISTINCT 运行得更快(特别是使用 Postgres 9)(即,我无法更改传入的 SQL,但可以在数据库级别优化某些内容)?

The query is basically:

SELECT DISTINCT "my_table"."foo" from "my_table" WHERE...

Pretending that I'm 100% certain the DISTINCT portion of the query is the reason it runs slowly, I've omitted the rest of the query to avoid confusion, since it is the distinct portion's slowness that I'm primarily concerned with (distinct is always a source of slowness).

The table in question has 2.5 million rows of data. The DISTINCT is needed for purposes not listed here (because I don't want back a modified query, but rather just general information about making distinct queries run faster at the DBMS level, if possible).

How can I make DISTINCT run quicker (using Postgres 9, specifically) without altering the SQL (ie, I can't alter this SQL coming in, but have access to optimize something at the DB level)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

稀香 2024-11-26 06:10:46

通常,您可以通过使用 group by 来解决 distinct 问题,从而使此类查询运行得更快:

select my_table.foo 
from my_table 
where [whatever where conditions you want]
group by foo;

Oftentimes, you can make such queries run faster by working around the distinct by using a group by instead:

select my_table.foo 
from my_table 
where [whatever where conditions you want]
group by foo;
陌上青苔 2024-11-26 06:10:46

您的 DISTINCT 导致它对输出行进行排序以查找重复项。如果您在查询选择的列上放置索引,数据库可能能够按索引顺序读出它们并保存排序步骤。很大程度上取决于查询的细节和所涉及的表——你说的“知道问题出在 DISTINCT”确实限制了可用答案的范围。

Your DISTINCT is causing it to sort the output rows in order to find duplicates. If you put an index on the column(s) selected by the query, the database may be able to read them out in index order and save the sort step. A lot will depend on the details of the query and the tables involved-- your saying you "know the problem is with the DISTINCT" really limits the scope of available answers.

贵在坚持 2024-11-26 06:10:46

您可以尝试增加 work_mem 设置,具体取决于数据集的大小,这可能会导致将查询计划切换为哈希聚合,这通常更快。

但在全球范围内将其设置得太高之前,请先阅读它。您可以轻松地炸毁您的服务器,因为 max_connections 设置充当该数字的乘数。

这意味着,如果您设置 work_mem = 128MB 并设置 max_connections = 100(默认值),则您应该拥有超过 12.8GB 的​​ RAM。您实际上是在告诉服务器它可以使用这么多内存来执行查询(甚至不考虑 Postgres 或其他方式使用的任何其他内存)。

You can try increasing the work_mem setting, depending on the size of Your dataset It can cause switching the query plan to hash aggregates, which are usually faster.

But before setting it too high globally, first read up on it. You can easily blow up Your server, because the max_connections setting acts as a multiplier to this number.

This means that if you were to set work_mem = 128MB and you set max_connections = 100 (the default), you should have more than 12.8GB of RAM. You're essentially telling the server that it can use that much for performing queries (not even considering any other memory use by Postgres or otherwise).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文