ETL 作为事务

发布于 2024-07-29 13:24:28 字数 183 浏览 5 评论 0原文

对于我迄今为止编写的所有 ETL,我从未将它们设为事务 - 即,如果表 4 失败,则回滚所有内容。

这方面的最佳实践是什么?

要“BeginTran + Commit”还是不要“BeginTran + Commit”

编辑:我有一个主包调用其他 4 个包 - 是否可以将它们全部汇总到一个事务中?

For all the ETLs I have written so far, I have never made them transactions - i.e. if table 4 fails, roll everything back.

What is the best practice in this regard?

To "BeginTran + Commit" or not to "BeginTran + Commit"

EDIT: I have one master package calling 4 other packages - is it possible to roll them all up into one transaction?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

病毒体 2024-08-05 13:24:28

以可管理的批量大小开始+提交。 您不希望每晚将 6 小时的导入打包到一个事务中。 保持批次大小最多可在 2-3 分钟内完成。 您肯定会遇到导致 ETL 失败的数据纯度问题,因此至少将影响减少到可管理的程度(即,不要触发将持续另外 6 小时才能完成的回滚) 。

begin+commit in manageable batch sizes. You don't want to wrap a 6 hours import into a single transaction every night. Keep your batches at a size that can finish in 2-3 minutes at most. That you will hit data purity issues that will fail an ETL is a given, so at least reduce the impact to something manageable (ie. don't trigger a rollback that will last another 6 hours to complete).

情深缘浅 2024-08-05 13:24:28

在 SSIS 中,我总是Begin Trans + Commit。 我想确保如果失败,我可以重新运行包而不会出现问题(或者必须找到实际插入的行)。

它只是使恢复和清理变得更加容易。

In SSIS, I always Begin Trans + Commit. I want to make sure that I can re-run the package without issue (or having to find what rows actually got inserted) if it fails.

It just makes recovery and cleanup so much easier.

离笑几人歌 2024-08-05 13:24:28

您经常会在 ETL 中移动太多数据而无法使用 SQL 事务(请记住,日志必须存储要回滚的所有数据)。 我更喜欢设计能够非破坏性地重新运行的包。 理想情况下,它们应该被设置为如果它们在中途死亡,您可以启动它们,它们将继续大约在它们停止的地方。 有时这会导致性能损失,但我认为这是值得的。

从技术上讲,您可以将包汇总到单个事务中; 实际上,也许不是。

You are often moving too much data in ETL to use a SQL transaction (the log has to store ALL the data to roll back, remember). I prefer to design packages such that they can be re-run nondestructively. Ideally they should be set up so that if they die in mid-stream, you can just start them and they'll continue somewhere approximately where they left off. Sometimes there's a performance penalty for that, but I think it's worth it.

Technically you can roll packages up into a single transaction; practically, maybe not.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文