长期大批量交易是否不好
你好,
我正在编写一个数据库应用程序,它使用假的serialisable
隔离级别(快照隔离
)进行大量插入和更新。
为了不进行大量的网络往返,我使用 PreparedStatement
在一个事务中批量插入和更新。它们应该很少失败,因为插入是预先检查的并且几乎与其他事务无冲突,因此回滚不会经常发生。
拥有大事务应该对 WAL 有好处,因为它可以刷新大块并且不必刷新小事务。
1.) 我只能看到大交易的积极影响。但我经常读到它们很糟糕。为什么它们在我的用例中会很糟糕?
2.) 当本地快照
合并回真实数据库时,检查冲突是否如此昂贵?数据库必须比较所有可能发生冲突的写入集(并行事务)。或者它有一些高速快捷方式吗?或者说这真的很便宜吗?
[编辑]如果有人能够澄清快照隔离数据库如何检查在时间线上具有重叠部分的事务是否检查不连续的写入集,这可能会很有趣。因为这就是假可序列化隔离级别的全部内容。
Hallo,
I am writing a database application that does a lot of inserts and updates with fake serialisable
isolation level (snapshot isolation
).
To not do tonnes of network roundtrips I'm batching inserts and updates in one transaction with PreparedStatement
s. They should fail very seldom because the inserts are prechecked and nearly conflict free to other transactions, so rollbacks don't occur often.
Having big transactions should be good for WAL, because it can flush big chunks and doesn't have to flush for mini transactions.
1.) I can only see positive effects of a big transaction. But I often read that they are bad. Why could they be bad in my use case?
2.) Is the checking for conflicts so expensive when the local snapshots
are merged back into the real database? The database will have to compare all write sets of possible conflicts (parallel transaction). Or does it do some high speed shortcut? Or is that quite cheap anyways?
[EDIT] It might be interesting if someone could bring some clarity into how a snapshot isolation database checks if transaction, which have overlapping parts on the timeline, are checked for disjunct write sets. Because that's what fake serializable isolation level is all about.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这里真正的问题有两个方面。第一个可能的问题是膨胀。大型事务可能会导致大量死元组同时出现。另一个可能的问题是长时间运行的事务。只要长时间运行的事务正在运行,它所接触的表就无法被清理,因此也可以收集大量死元组。
我想说只需使用 check_postgresql.pl 来检查肿胀问题。只要您在长时间交易后没有看到大量表膨胀就可以了。
The real issues here are two fold. The first possible problem is bloat. Large transactions can result in a lot of dead tuples showing up at once. The other possible problem is from long running transactions. As long as a long running transaction is running, the tables it's touching can't be vacuumed so can collect lots of dead tuples as well.
I'd say just use check_postgresql.pl to check for bloating issues. As long as you don't see a lot of table bloat after your long transactions you're ok.
1)手册说它很好: http://www.postgresql.org/ docs/current/interactive/populate.html
我还可以建议使用复制,删除索引(但首先测试),增加maintenance_work_mem,增加checkpoint_segments,运行ANALYZE(或VACUUM分析)然后。
如果您不确定,我不会推荐:删除外键约束、禁用 WAL 归档和流复制。
2)数据总是在提交时合并,但没有检查,数据只是写入。再读一遍: http://www.postgresql.org/docs/current/ Interactive/transaction-iso.html
如果您的插入/更新不依赖于其他插入/更新,则不需要“完全一致的视图”。您可以使用已提交读并且事务永远不会失败。
1) Manual says that it is good: http://www.postgresql.org/docs/current/interactive/populate.html
I can recommend also to Use COPY, Remove Indexes (but first test), Increase maintenance_work_mem, Increase checkpoint_segments, Run ANALYZE (or VACUUM ANALYZE) Afterwards.
I will not recommed if you are not sure: Remove Foreign Key Constraints, Disable WAL archival and streaming replication.
2) Always data are merged on commit but there is no checks, data are just written. Read again: http://www.postgresql.org/docs/current/interactive/transaction-iso.html
If your inserts/updates does not depend on other inserts/updates you don't need "wholly consistent view". You may use read committed and transaction will never fail.