带 limit 子句的查询的查询计划

发布于 2025-01-12 21:18:51 字数 1736 浏览 0 评论 0原文

我在 PostgreSQL（版本 11.12）数据库中使用 EXPLAIN 命令来查看查询 select col1, col2 from some_table limit 10 的查询计划，得到以下结果：-

some_db=> EXPLAIN select col1, col2 from some_table limit 10;
                                QUERY PLAN
--------------------------------------------------------------------------
 Limit  (cost=0.00..0.32 rows=10 width=33)
   ->  Seq Scan on user_dim  (cost=0.00..263325.95 rows=8106495 width=33)
(2 rows)

根据我的理解，查询计划中的步骤越低，执行的越早。但我注意到这个查询计划首先顺序扫描整个表，然后选择前两行。我很惊讶地看到这一点，因为我预计限制子句不会让完整的顺序扫描发生。

我尝试在 PostgreSQL 文档并发现：

“在某些情况下，实际值和估计值无法很好地匹配，但实际上并没有什么问题。当计划节点执行因 LIMIT 或类似效果而短暂停止时，就会发生这种情况。例如，在我们之前使用的 LIMIT 查询，

EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2;

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1)
         Index Cond: (unique2 > 9000)
         Filter: (unique1 < 100)
         Rows Removed by Filter: 287
 Planning time: 0.096 ms
 Execution time: 0.336 ms

索引扫描节点的估计成本和行数显示为好像它已运行完成，但实际上，Limit 节点在获得 2 行后停止请求行数，因此实际行数仅为 2。并且运行时间少于成本估计所建议的时间，这不是估计错误，只是估计值和真实值的显示方式存在差异。”

我从中了解到，这只是一个显示问题，并且该查询计划不会实际执行（即，只会获取 limit 子句中指定的行数）。我的理解正确还是我在这里遗漏了一些东西？

感谢您阅读本文。

原文

I used the EXPLAIN command in a PostgreSQL (version 11.12) DB to see the query plan of the query select col1, col2 from some_table limit 10 and I got the following: -

some_db=> EXPLAIN select col1, col2 from some_table limit 10;
                                QUERY PLAN
--------------------------------------------------------------------------
 Limit  (cost=0.00..0.32 rows=10 width=33)
   ->  Seq Scan on user_dim  (cost=0.00..263325.95 rows=8106495 width=33)
(2 rows)

As per my understanding, the lower the step in a query plan, the earlier it is executed. But I noticed that this query plan first sequentially scans the entire table and then selects the first two rows. I was surprised to see this as I had expected that the limit clause would not let the full sequential scan happen.

I tried finding an answer to this in PostgreSQL documentation and found this:

"There are cases in which the actual and estimated values won't match up well, but nothing is really wrong. One such case occurs when plan node execution is stopped short by a LIMIT or similar effect. For example, in the LIMIT query we used before,

EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2;

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1)
         Index Cond: (unique2 > 9000)
         Filter: (unique1 < 100)
         Rows Removed by Filter: 287
 Planning time: 0.096 ms
 Execution time: 0.336 ms

the estimated cost and row count for the Index Scan node are shown as though it were run to completion. But in reality, the Limit node stopped requesting rows after it got two, so the actual row count is only 2 and the run time is less than the cost estimate would suggest. This is not an estimation error, only a discrepancy in the way the estimates and true values are displayed."

What I understand from this is that this is just a display issue and this query plan won't be executed actually (i.e. only the number of rows specified in the limit clause would be fetched). Is my understanding correct or am I missing something here?

Thank you for reading this.

分享到QQ

分享到微博