在 SSIS 中按顺序处理项目
在平面文件中,我们有这种类型的记录:
ID、操作、数据
操作包括 D = 删除、U = 更新、I = 插入
1、D、测试
问题是他们在文件中多次发送相同的 ID,并且ID 可以是文件中的任何位置:
例如:
1、D、稍后测试
1000 条记录....
1、I、稍后测试
6000 条记录...
1、D、测试
在我们的 SSIS 包中,我们有一个条件分割它指导或分裂数据流取决于操作、删除与更新。 SSIS 的问题在于它的行处理不是顺序的(这对性能来说是件好事),并且删除可能发生在插入之后,反之亦然,这会导致数据不正确。
处理这个问题的最佳方法是什么?
我们提出了一些想法:
将文件转储到临时表中,并将文件的行 ID 附加到记录中,然后使用游标在 SQL Server 中按顺序循环记录(按文件行号排序)。
使用 ForEach 容器并在 SSIS 中按顺序浏览每条记录,
按 ID 对文件进行排序,然后获取“最新”更新。
打破它分隔文件的多个操作。例如,如果一条记录有 3 次更新,OriginalFile 将变为 File1、File2、File3,然后通过 SSIS 按顺序运行文件。
并不是所有的记录都是这样的。似乎那里给我们的是交易历史记录而不是最新更新。
谢谢。
In a flat file we have this type of record:
ID, Operation, Data
Operations include D = Delete, U = Update, I = Insert
1, D, Test
The problem is they are sending the same ID in the file multiple times and the ID could be anywhere in the file:
For example:
1, D, Test
1000 records later....
1, I, Testing
6000 records later....
1, D, Testing
In our SSIS package, we have a conditional split which directs or splits the data flow depending on the operation, delets versus updates. The problem in SSIS is that it is not sequential in its rows processing (which is a good thing for performance) and deletes may happen after an insert or vice versa, which results in incorrect data.
What's the best way to handle this?
Some ideas we have come up with:
Dump the file into a temp table and have the row ID of the file appended to the record, then cycle throught the records sequentially in SQL Server using a cursor (order by file line number).
Use the ForEach container and go through each record sequentially in SSIS
Sort the file by IDs and then take the "latest" update.
Break the multiple operations it separate files. For example, if a record had three updates, OriginalFile would become File1, File2, File3 and then run the files in order through SSIS.
Not all the records are like this. It seems there are giving us a transactional history of the record rather than the latest update.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据个人经验,我发现在处理之前首先将交易数据加载到临时表中是处理平面文件交易数据的最有效方法。从那里,这取决于您想要如何保留历史交易数据。就我个人而言,我的公司希望查看产品何时添加、对产品进行的最后一次更新、最终发送的时间以及客户何时收到产品。因此,将交易数据加载到临时表中以专门获取他们想要的内容对我有用。
From personal experience, I found loading transaction data into a temp table first before processing to be the most effective way to process transaction data from a flat file. From there, it depends on how you want to keep historical transaction data. Personally, my company wants to see when the product is added, the last update made to the product, when it is finally sent, and when the product is received by the customer. So, loading the transaction data to a temp table to grab specifically what they want works for me.