为什么 Postgres 在这个查询中进行哈希处理？

发布于 2024-09-05 18:34:45 字数 864 浏览 3 评论 0原文

我有两个表：A 和 P。我想从 A 中的所有行中获取信息，这些行的 id 位于我创建的临时表 tmp_ids 中。但是，P 表 foo 中有有关 A 的附加信息，我也想获取此信息。我有以下查询：

SELECT A.H_id AS hid,
       A.id AS aid,
       P.foo, A.pos, A.size
FROM tmp_ids, P, A
WHERE tmp_ids.id = A.H_id
  AND P.id = A.P_id

我注意到它运行得很慢，当我要求 Postgres 解释时，我注意到它结合了 tmp_ids 和我为 A 创建的索引>H_id 带有嵌套循环。但是，它会在与第一次合并的结果进行哈希连接之前对所有 P 进行哈希处理。 P 相当大，我认为这就是一直以来所花费的时间。为什么它会在那里创建一个哈希？ P.id 是 P 的主键，A.P_id 有自己的索引。

更新：所有数据类型都是 INTEGER，除了 A.size （双精度）和 P.foo （VARCHAR）。我正在使用 PostgreSQL 版本 8.4。

解释如下： http://explain.depesz.com/s/WBo 。

原文

I have two tables: A and P. I want to get information out of all rows in A whose id is in a temporary table I created, tmp_ids. However, there is additional information about A in the P table, foo, and I want to get this info as well. I have the following query:

SELECT A.H_id AS hid,
       A.id AS aid,
       P.foo, A.pos, A.size
FROM tmp_ids, P, A
WHERE tmp_ids.id = A.H_id
  AND P.id = A.P_id

I noticed it going slowly, and when I asked Postgres to explain, I noticed that it combines tmp_ids with an index on A I created for H_id with a nested loop. However, it hashes all of P before doing a Hash join with the result of the first merge. P is quite large and I think this is what's taking all the time. Why would it create a hash there? P.id is P's primary key, and A.P_id has an index of its own.

UPDATE: All the datatypes are INTEGER, except A.size which is a DOUBLE PRECISION and P.foo which is VARCHAR. I'm using PostgreSQL version 8.4.

Here is the explain: http://explain.depesz.com/s/WBo .

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

饮惑 2024-09-12 18:34:45

查询规划器估计顺序读取所有数据并对其进行散列处理比执行估计 2100 次索引扫描及其相关的更多随机磁盘访问要更快。

回复收藏 0 原文

雪化雨蝶 2024-09-12 18:34:45

如果没有看到解释分析，此类问题通常是由于统计信息关闭或 random_page_cost 或 seq_page_cost 所需的异常设置造成的。

它可能运行得更好

set enable_hashjoin = false;

Without seeing an explain analyze, these kind of problems usually result from statistics being off or an unusual setting required for random_page_cost or seq_page_cost.

It may run better with

set enable_hashjoin = false;

回复收藏 0 原文

疏忽 2024-09-12 18:34:45

您的问题是优化器没有正确的统计信息来确定要创建多少个匹配“A.H_id = tmp_ids.id”，这是临时表的常见问题 - 他们没有这样的统计信息普通的可以。它猜测“在 A 上使用 idx_A_handid 进行索引扫描”将匹配 21 行，但实际上只有 3 行。在解释分析中突出显示，最低级别的向上箭头旁边有一个 7，给出了估计错误程度的乘数。

该错误会继续到它认为有 2100 行需要扫描的位置，此时它可能会进行完整的顺序扫描并对结果进行哈希处理，因为这可能会触及表中的大多数块。

如果它正确地知道只有 300 个数据需要探测，它可能会做一些不同的事情，只涉及数据的一个子集。由于临时表缺乏统计信息，您不能指望从临时表的联接中获得良好的计划。在这种情况下，在执行查询之前关闭 enable_hashjoin 来推动正确的行为可能是合适的。

回复收藏 0 原文

~没有更多了~