带 limit 子句的查询的查询计划
我在 PostgreSQL(版本 11.12)数据库中使用 EXPLAIN
命令来查看查询 select col1, col2 from some_table limit 10
的查询计划,得到以下结果:-
some_db=> EXPLAIN select col1, col2 from some_table limit 10;
QUERY PLAN
--------------------------------------------------------------------------
Limit (cost=0.00..0.32 rows=10 width=33)
-> Seq Scan on user_dim (cost=0.00..263325.95 rows=8106495 width=33)
(2 rows)
根据我的理解,查询计划中的步骤越低,执行的越早。但我注意到这个查询计划首先顺序扫描整个表,然后选择前两行。我很惊讶地看到这一点,因为我预计限制子句不会让完整的顺序扫描发生。
我尝试在 PostgreSQL 文档 并发现:
“在某些情况下,实际值和估计值无法很好地匹配,但实际上并没有什么问题。当计划节点执行因 LIMIT 或类似效果而短暂停止时,就会发生这种情况。例如,在我们之前使用的 LIMIT 查询,
EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1)
-> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1)
Index Cond: (unique2 > 9000)
Filter: (unique1 < 100)
Rows Removed by Filter: 287
Planning time: 0.096 ms
Execution time: 0.336 ms
索引扫描节点的估计成本和行数显示为好像它已运行完成,但实际上,Limit 节点在获得 2 行后停止请求行数,因此实际行数仅为 2。并且运行时间少于成本估计所建议的时间,这不是估计错误,只是估计值和真实值的显示方式存在差异。”
我从中了解到,这只是一个显示问题,并且该查询计划不会实际执行(即,只会获取 limit 子句中指定的行数)。我的理解正确还是我在这里遗漏了一些东西?
感谢您阅读本文。
I used the EXPLAIN
command in a PostgreSQL (version 11.12) DB to see the query plan of the query select col1, col2 from some_table limit 10
and I got the following: -
some_db=> EXPLAIN select col1, col2 from some_table limit 10;
QUERY PLAN
--------------------------------------------------------------------------
Limit (cost=0.00..0.32 rows=10 width=33)
-> Seq Scan on user_dim (cost=0.00..263325.95 rows=8106495 width=33)
(2 rows)
As per my understanding, the lower the step in a query plan, the earlier it is executed. But I noticed that this query plan first sequentially scans the entire table and then selects the first two rows. I was surprised to see this as I had expected that the limit clause would not let the full sequential scan happen.
I tried finding an answer to this in PostgreSQL documentation and found this:
"There are cases in which the actual and estimated values won't match up well, but nothing is really wrong. One such case occurs when plan node execution is stopped short by a LIMIT or similar effect. For example, in the LIMIT query we used before,
EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1)
-> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1)
Index Cond: (unique2 > 9000)
Filter: (unique1 < 100)
Rows Removed by Filter: 287
Planning time: 0.096 ms
Execution time: 0.336 ms
the estimated cost and row count for the Index Scan node are shown as though it were run to completion. But in reality, the Limit node stopped requesting rows after it got two, so the actual row count is only 2 and the run time is less than the cost estimate would suggest. This is not an estimation error, only a discrepancy in the way the estimates and true values are displayed."
What I understand from this is that this is just a display issue and this query plan won't be executed actually (i.e. only the number of rows specified in the limit clause would be fetched). Is my understanding correct or am I missing something here?
Thank you for reading this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于顺序扫描,成本为 0.00..263325.95。
0.00 是启动成本。这是输出阶段开始之前所花费的时间。
263325.95 是估计的总成本。这是基于计划节点运行完成的假设,即检索所有可用行。实际上,节点的父节点可能无法读取所有可用行。
对于此节点,rows=8106495。这是该计划节点输出的估计行数。同样,假设该节点已运行完成。
简单来说,顺序扫描将立即开始向父节点输出结果(启动成本为 0.00),如果不停止,将输出 8106495 行,总成本为 263325.95。
但当然之前会被停止,因为父节点的原因!
父节点(限制):cost=0.00..0.32 rows=10
该节点也将立即开始输出结果(启动成本为 0.00),如果不停止,将输出 10 行,总成本为 0.32。
来源
For the sequential scan, cost is 0.00..263325.95.
0.00 is the startup cost. This is the time expended before the output phase can begin.
263325.95 is the estimated total cost. This is stated on the assumption that the plan node is run to completion, i.e., all available rows are retrieved. In practice a node's parent node might stop short of reading all available rows.
For this node, rows=8106495. This is the estimated number of rows output by this plan node. Again, the node is assumed to be run to completion.
In plain english, the sequential scan will start outputing results to the parent node right away (startup cost is 0.00), and would output 8106495 rows for a total cost of 263325.95 if not stopped.
But of course it will be stopped before, because of the parent node!
Parent node (the limit): cost=0.00..0.32 rows=10
This node will also start outputing results immediately (startup cost is 0.00) and if not stopped will output 10 rows for a total cost of 0.32.
source