变更记录集成
我们需要执行日终处理,从系统 A 中提取每日交易,并仅将更改传输到系统 B。
问题是系统 A 只能提供系统 A 中可用的全套交易。
我最初的想法我们将使用临时表 (SQL Server),该表将保留系统 A 中的数据,然后用于每次执行日终比较时进行比较。这一切都可以通过使用表连接来识别所需的 UPDATE、INSERT、DELETE 来完成。
作为一个 SSIS 专家,我知道这可以在 SSIS 中使用 LOOKUP 来识别添加、更新和删除来完成。
问题:
SSIS 解决方案是更好的方法吗?为什么(可维护性、可扩展性、可扩展性)?
哪个性能更好?对这两个选项有什么经验吗?
有其他选择吗?
We have the need to perform an end of day process to extract the daily transactions from System A and transfer only the changes to System B.
The problem is that System A can only provide the full set of transactions available in System A.
My initial thoughts were to use a staging table (SQL Server) which will persist the data from System A, and then is used for comparison purposes for each execution of the end of day comparison. This can all be done using table joins to identify the required UPDATEs, INSERTs, DELETEs.
Not being an SSIS expert I understand this could be done in SSIS using LOOKUPs to identify the additions, updates and deletion.
Question:
Is the SSIS solution a better approach and why (maintainability, scalability, extensibility) ?
Which would be better performing? Any experience on these 2 options?
Is there any alternative option?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于您需要来自系统 A 的全套交易,因此就源而言,这限制了您的选择。我建议将该数据拉至原始文件目标。这将在您的开发过程中为您提供帮助,因为您可以一遍又一遍地运行需要该数据的任务,而无需重新获取。另外,请确保源数据按原始计算机排序。 SSIS 在排序方面非常弱,除非您使用第三方组件(在某些情况下这可能是一个限制职业的决定)。
不管怎样,我们假设你已经整理好原始文件了。接下来您要做的就是将其作为原始文件源放入数据流中。然后,有一个代表系统 B 的 OLEDB(或其他)源。如果您愿意,您也可以使用原始文件。确保系统 B 中的数据使用与系统 A 排序时使用的列相同的列进行排序。
使用 IsSorted=True 标记源,并在元数据中的相应列上设置 SortKey 值。这将告诉 SSIS 数据已预先排序,并且允许您 JOIN 关键列。否则,您可能需要等待数天才能让 SSIS 对大集进行排序。
将多播添加到系统 A 和系统 B 的源中,因为我们想要两次利用它们。
接下来,添加合并连接以将两个原始文件源连接在一起。将系统 A 设为左输入。当您将系统 B 连接到合并连接时,系统 B 将成为正确的输入。 SSIS 将自动在您在上一步中标记的那些已排序列上设置 JOIN。将合并连接设置为使用 LEFT JOIN。这样,我们可以找到系统 A 中不存在于系统 B 中的行,并且可以比较现有行以查看它们是否已更改。
接下来,添加条件拆分。在那里,您可以根据条件定义 2 个输出缓冲区。
默认输出将采用不满足这两个条件的任何行,并且可以安全地忽略。
我们还没有完成。
添加另一个合并连接。
这次,将来自系统 B 的 MultiCast 的输入设为左侧输入。使系统 A 的 MultiCast 的输入成为正确的输入。同样,SSIS 将为您设置连接键。将合并连接设置为使用左连接。
在此流程中添加条件拆分,您唯一需要的是:
默认输出将采用所有其他行,可以忽略这些行。
现在你有 3 个缓冲区。
其中 2 个来自您的第一次合并连接,代表新行和更新行。
其中 1 个来自第二个合并连接,代表已删除行。
采取行动。
Since you need the full set of transactions from System A, that limits your options as far as source goes. I recommend pulling that data down to a Raw File Destination. This will help you as you develop, since you can just run the tasks that need that data over and over again without refetching. Also, make sure that the source data is sorted by the origin machine. SSIS is very weak with sorting unless you use a 3rd party component (which may be a career-limiting decision in some cases).
Anyway, let's assume that you have that sorted Raw File lying around. Next thing you do is toss that into a Data Flow as a Raw File Source. Then, have an OLEDB (or whatever) source that represents System B. You could use Raw File for that, also, if you like. Make sure that the data from System B is sorted using the same columns you used to sort System A.
Mark the Sources with IsSorted=True, and set the SortKey value on the appropriate columns in the metadata. This will tell SSIS that the data is pre-sorted, and it will permit you to JOIN on your key columns. Otherwise, you may wait days for SSIS to sort big sets.
Add MultiCasts to both System A and System B's sources, because we want to leverage them twice.
Next, add a Merge Join to join the two Raw File Sources together. Make System A the left input. System B will become the right input when you connect it to the Merge Join. SSIS will automatically set the JOIN up on those sorted columns that you marked in the previous step. Set the Merge Join to use LEFT JOIN. This way, we can find rows in System A that do not exist in System B, and we can compare existing rows to see if they were changed.
Next up, add a Conditional Split. In there, you can define 2 output buffers based on conditions.
The default output will take whatever rows do not meet those 2 conditions, and may be safely ignored.
We are not done.
Add another Merge Join.
This time, make an input from System B's MultiCast the left input. Make an input from System A's MultiCast the right input. Again, SSIS will set the join keys up for you. Set the Merge Join to use Left Join.
Add a Conditional Split down this flow, and the only thing you need is this:
The default output will take all of the other rows, which can be ignored.
Now you have 3 buffers.
2 of them come out of your first Merge Join, and represent New Rows and Updated Rows.
1 of them comes out of your second Merge Join, and represents Deleted Rows.
Take action.