带 limit 子句的查询的查询计划

发布于 2025-01-12 21:18:51 字数 1736 浏览 0 评论 0原文

我在 PostgreSQL(版本 11.12)数据库中使用 EXPLAIN 命令来查看查询 select col1, col2 from some_table limit 10 的查询计划,得到以下结果:-

some_db=> EXPLAIN select col1, col2 from some_table limit 10;
                                QUERY PLAN
--------------------------------------------------------------------------
 Limit  (cost=0.00..0.32 rows=10 width=33)
   ->  Seq Scan on user_dim  (cost=0.00..263325.95 rows=8106495 width=33)
(2 rows)

根据我的理解,查询计划中的步骤越低,执行的越早。但我注意到这个查询计划首先顺序扫描整个表,然后选择前两行。我很惊讶地看到这一点,因为我预计限制子句不会让完整的顺序扫描发生。

我尝试在 PostgreSQL 文档 并发现:

“在某些情况下,实际值和估计值无法很好地匹配,但实际上并没有什么问题。当计划节点执行因 LIMIT 或类似效果而短暂停止时,就会发生这种情况。例如,在我们之前使用的 LIMIT 查询,

EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2;

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1)
         Index Cond: (unique2 > 9000)
         Filter: (unique1 < 100)
         Rows Removed by Filter: 287
 Planning time: 0.096 ms
 Execution time: 0.336 ms

索引扫描节点的估计成本和行数显示为好像它已运行完成,但实际上,Limit 节点在获得 2 行后停止请求行数,因此实际行数仅为 2。并且运行时间少于成本估计所建议的时间,这不是估计错误,只是估计值和真实值的显示方式存在差异。”

我从中了解到,这只是一个显示问题,并且该查询计划不会实际执行(即,只会获取 limit 子句中指定的行数)。我的理解正确还是我在这里遗漏了一些东西?

感谢您阅读本文。

I used the EXPLAIN command in a PostgreSQL (version 11.12) DB to see the query plan of the query select col1, col2 from some_table limit 10 and I got the following: -

some_db=> EXPLAIN select col1, col2 from some_table limit 10;
                                QUERY PLAN
--------------------------------------------------------------------------
 Limit  (cost=0.00..0.32 rows=10 width=33)
   ->  Seq Scan on user_dim  (cost=0.00..263325.95 rows=8106495 width=33)
(2 rows)

As per my understanding, the lower the step in a query plan, the earlier it is executed. But I noticed that this query plan first sequentially scans the entire table and then selects the first two rows. I was surprised to see this as I had expected that the limit clause would not let the full sequential scan happen.

I tried finding an answer to this in PostgreSQL documentation and found this:

"There are cases in which the actual and estimated values won't match up well, but nothing is really wrong. One such case occurs when plan node execution is stopped short by a LIMIT or similar effect. For example, in the LIMIT query we used before,

EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2;

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1)
         Index Cond: (unique2 > 9000)
         Filter: (unique1 < 100)
         Rows Removed by Filter: 287
 Planning time: 0.096 ms
 Execution time: 0.336 ms

the estimated cost and row count for the Index Scan node are shown as though it were run to completion. But in reality, the Limit node stopped requesting rows after it got two, so the actual row count is only 2 and the run time is less than the cost estimate would suggest. This is not an estimation error, only a discrepancy in the way the estimates and true values are displayed."

What I understand from this is that this is just a display issue and this query plan won't be executed actually (i.e. only the number of rows specified in the limit clause would be fetched). Is my understanding correct or am I missing something here?

Thank you for reading this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

音盲 2025-01-19 21:18:51

对于顺序扫描,成本为 0.00..263325.95。

0.00 是启动成本。这是输出阶段开始之前所花费的时间。

263325.95 是估计的总成本。这是基于计划节点运行完成的假设,即检索所有可用行。实际上,节点的父节点可能无法读取所有可用行。

对于此节点,rows=8106495。这是该计划节点输出的估计行数。同样,假设该节点已运行完成。

简单来说,顺序扫描将立即开始向父节点输出结果(启动成本为 0.00),如果不停止,将输出 8106495 行,总成本为 263325.95。

但当然之前会被停止,因为父节点的原因!

父节点(限制):cost=0.00..0.32 rows=10
该节点也将立即开始输出结果(启动成本为 0.00),如果不停止,将输出 10 行,总成本为 0.32。

来源

For the sequential scan, cost is 0.00..263325.95.

0.00 is the startup cost. This is the time expended before the output phase can begin.

263325.95 is the estimated total cost. This is stated on the assumption that the plan node is run to completion, i.e., all available rows are retrieved. In practice a node's parent node might stop short of reading all available rows.

For this node, rows=8106495. This is the estimated number of rows output by this plan node. Again, the node is assumed to be run to completion.

In plain english, the sequential scan will start outputing results to the parent node right away (startup cost is 0.00), and would output 8106495 rows for a total cost of 263325.95 if not stopped.

But of course it will be stopped before, because of the parent node!

Parent node (the limit): cost=0.00..0.32 rows=10
This node will also start outputing results immediately (startup cost is 0.00) and if not stopped will output 10 rows for a total cost of 0.32.

source

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文