如何提高 Postgresql 中 UPDATE 查询结果时间?
我有以下查询运行了 3 小时以上:
UPDATE eop_201007
set coord_x = gi.x_etrs89, coord_y = gi.x_etrs89,gr_type = 4
from eop_201007 as eop, geoindex201001 as gi
where eop.cp7=gi.cp7 AND eop.gr_type=0;
eop 表有 300k 以上的记录,gi 表有 100k 以上的记录。
cp7 字段在两个表中都建立了索引,这需要太多时间才能完成。
我做错了吗?我该如何改进这个?
I have the following query running for 3hours+:
UPDATE eop_201007
set coord_x = gi.x_etrs89, coord_y = gi.x_etrs89,gr_type = 4
from eop_201007 as eop, geoindex201001 as gi
where eop.cp7=gi.cp7 AND eop.gr_type=0;
eop table has 300k+ records, and gi table 100k+.
The cp7 field is indexed in both tables, and this is taking way too much time to complete.
Am I doing it wrong? How can I improve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
检查此主题并使用 EXPLAIN看看发生了什么。更好的配置 WAL 也可能有所帮助,只需在更新过程中检查内存使用情况和写入速度即可。
编辑:并确保没有其他事务锁定您的表,您必须永远等待......
Check this topic and use EXPLAIN to see what is going on. A better configuration for WAL might help as well, just check memory usage and write speed during the update.
Edit: And make sure no other transactions lock your table, you have to wait forever...
您的 FROM 中不需要“eop_201007 as eop”。以下内容将起作用:
我认为额外的 eop 会导致交叉联接(基本上是两个巨大表的叉积),因为它不受原始 eop 表的限制,该表已经“自动”出现在 FROM 列表中
如果这不能解决问题,这里有一些其他的想法:
如果你还没有的话,你可能想先对其进行真空分析。确保在 postgresql.conf 中调整了所有内存设置。工作内存、共享缓冲区等可以产生巨大的差异。
如果这是一次性的事情,而不是每晚的工作,您应该关闭 fsync。另外,请确保(如果关闭 fsync)没有配置太多检查点段(24 个左右即可),否则会污染磁盘缓存。
正如@Frank Heikens 所说,你应该看看解释。另请检查 EXPLAIN ANALYZE(如果您的查询确实完成)。
You don't need the "eop_201007 as eop" in your FROM. The following will work:
I think the extra eop is causing a cross-join (basically the cross product of the two tables which is HUGE) because it isn't constrained against the original eop table which is "automatically" in the FROM list already
If that doesn't fix things, here's some other thoughts:
You probably want to do a vacuum analyze on it first if you haven't. Make sure you have all your memory settings tunes in postgresql.conf. Working memory, shared buffers, etc can make a huge difference.
If this is a one-time thing, and not a nightly job, you should turn fsync off. Also, make sure (if you turn fsync off) you don't have too many checkpoint segments configured (24 or so will do) otherwise you'll pollute your disk cache.
As @Frank Heikens said, you should look at explain. Also check EXPLAIN ANALYZE (if you query does ever finish).