ETL 作为事务
对于我迄今为止编写的所有 ETL,我从未将它们设为事务 - 即,如果表 4 失败,则回滚所有内容。
这方面的最佳实践是什么?
要“BeginTran + Commit”还是不要“BeginTran + Commit”
编辑:我有一个主包调用其他 4 个包 - 是否可以将它们全部汇总到一个事务中?
For all the ETLs I have written so far, I have never made them transactions - i.e. if table 4 fails, roll everything back.
What is the best practice in this regard?
To "BeginTran + Commit" or not to "BeginTran + Commit"
EDIT: I have one master package calling 4 other packages - is it possible to roll them all up into one transaction?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
以可管理的批量大小开始+提交。 您不希望每晚将 6 小时的导入打包到一个事务中。 保持批次大小最多可在 2-3 分钟内完成。 您肯定会遇到导致 ETL 失败的数据纯度问题,因此至少将影响减少到可管理的程度(即,不要触发将持续另外 6 小时才能完成的回滚) 。
begin+commit in manageable batch sizes. You don't want to wrap a 6 hours import into a single transaction every night. Keep your batches at a size that can finish in 2-3 minutes at most. That you will hit data purity issues that will fail an ETL is a given, so at least reduce the impact to something manageable (ie. don't trigger a rollback that will last another 6 hours to complete).
在 SSIS 中,我总是
Begin Trans + Commit
。 我想确保如果失败,我可以重新运行包而不会出现问题(或者必须找到实际插入的行)。它只是使恢复和清理变得更加容易。
In SSIS, I always
Begin Trans + Commit
. I want to make sure that I can re-run the package without issue (or having to find what rows actually got inserted) if it fails.It just makes recovery and cleanup so much easier.
您经常会在 ETL 中移动太多数据而无法使用 SQL 事务(请记住,日志必须存储要回滚的所有数据)。 我更喜欢设计能够非破坏性地重新运行的包。 理想情况下,它们应该被设置为如果它们在中途死亡,您可以启动它们,它们将继续大约在它们停止的地方。 有时这会导致性能损失,但我认为这是值得的。
从技术上讲,您可以将包汇总到单个事务中; 实际上,也许不是。
You are often moving too much data in ETL to use a SQL transaction (the log has to store ALL the data to roll back, remember). I prefer to design packages such that they can be re-run nondestructively. Ideally they should be set up so that if they die in mid-stream, you can just start them and they'll continue somewhere approximately where they left off. Sometimes there's a performance penalty for that, but I think it's worth it.
Technically you can roll packages up into a single transaction; practically, maybe not.