Postgres选择查询进行顺序扫描,而不是在桌子上使用1800万行的索引扫描
我有一个Postgres表,该表有近1800万行,我试图运行此查询,
select * from answer where submission_id = 5 and deleted_at is NULL;
列cummission_id的表上有一个部分索引。这是用于创建索引的命令,
CREATE INDEX answer_submission_id ON answer USING btree (submission_id) WHERE (deleted_at IS NULL)
这是上述选择查询的解释分析
Gather (cost=1000.00..3130124.70 rows=834 width=377) (actual time=7607.568..7610.130 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2144966 read=3
I/O Timings: read=6.169
-> Parallel Seq Scan on answer (cost=0.00..3129041.30 rows=348 width=377) (actual time=6501.951..7604.623 rows=1 loops=3)
Filter: ((deleted_at IS NULL) AND (submission_id = 5))
Rows Removed by Filter: 62213625
Buffers: shared hit=2144966 read=3
I/O Timings: read=6.169
Planning Time: 0.117 ms
Execution Time: 7610.154 ms
,理想情况下应该选择答案_submission_id索引。但是Postgres正在进行顺序扫描。
任何帮助真的很感激
I have a postgres table which has almost 18 Million rows and I am trying to run this query
select * from answer where submission_id = 5 and deleted_at is NULL;
There is an partial index on the table on column submission_id. This is the command used to create index
CREATE INDEX answer_submission_id ON answer USING btree (submission_id) WHERE (deleted_at IS NULL)
This is the explain analyse of the above select query
Gather (cost=1000.00..3130124.70 rows=834 width=377) (actual time=7607.568..7610.130 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2144966 read=3
I/O Timings: read=6.169
-> Parallel Seq Scan on answer (cost=0.00..3129041.30 rows=348 width=377) (actual time=6501.951..7604.623 rows=1 loops=3)
Filter: ((deleted_at IS NULL) AND (submission_id = 5))
Rows Removed by Filter: 62213625
Buffers: shared hit=2144966 read=3
I/O Timings: read=6.169
Planning Time: 0.117 ms
Execution Time: 7610.154 ms
Ideally it should pick the answer_submission_id index. But postgres is going for an sequential scan.
Any help would be really thankful
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
执行计划表明我们在估计的读取行与实际读取行之间存在偏差。
PostgreSQL Optimizer是一个基于成本的优化器(CBO)查询,将通过执行计划中最小的成本执行。
因此,错误的统计数据可能会选择不良的执行计划。
有一个链接来表示错误的统计信息,导致查询缓慢。
为什么Postgres中的不良行估计很慢? >
首先,我将使用此查询搜索
last_analyze
&last_vacuum
上次。如果您的统计数据错了,我们可以使用
分析“ TaberName”
来帮助我们从表中收集新统计信息,分析
扫描速度取决于表尺寸。当我们
更新
和delete
数据将创建可能存在于堆或索引中的死元元组,但我们无法查询,vacuum
可以帮助我们收回死亡元素所占据的存储空间。The execution plan shows us there is a deviation between the estimated read row and the actual read row.
Postgresql optimizer is a cost-based optimizer (CBO) queries will be executed by the smallest cost from execution plans.
so that the wrong statistics might choose a bad execution plan.
There is a link to represent the wrong statistics causing a slow query.
Why are bad row estimates slow in Postgres?
Firstly I will use this query to search
last_analyze
&last_vacuum
last time.if your statistics are wrong we can use
ANALYZE "tablename"
to help us collect new statistics from the table,ANALYZE
scans speed depends on table size.When we
UPDATE
andDELETE
data that will create a dead tuple which might exist in the heap or indexes but we can't query that,VACUUM
can help us to reclaim storage occupied by dead tuples.