SSIS 平面文件目标中的重复记录
我正在写入 2008 SSIS 包中的平面文件目标。 99.99% 都能正确运行。但是,我在目标文件中收到一条重复记录。
以下是该包的基本流程:
1.读取两个ISO-8859-1编码的文件,并将其文本在内存中编码为UTF8
2. 在内存中将两个文件组合在一起,并将它们加载到查找缓存中
3.从磁盘读取另一个源文件
4. 将源文件中的 ID 列与查找缓存中的 ID 列进行匹配
5.如果ID与查找缓存中的ID匹配,则将其写入匹配文件,如果ID不匹配,则将其写入另一个文件
从头到尾一切正常。但是,我在匹配文件中收到了重复项。我开始怀疑重复是由查找缓存文本文件连接时的文件结尾(或其他)特殊字符引起的。这些文件是在 UNIX 系统上生成的(但我在读取它们时将它们编码为 UTF8)。重复记录每次都是相同的记录。如何避免获得重复项(或找出重复项来自何处)?我无法使用删除重复项,因为目标中存在合法的重复项。几周来我一直在试图解决这个问题。
I am writing to a flat file destination in a 2008 SSIS package. 99.99% of it works correctly. However, I get one duplicate record in the destination file.
Here is the basic flow of the package:
1. Read two ISO-8859-1 encoded files and encode their text to UTF8 in memory
2. Combine the two files together in memory and load them into a lookup cache
3. Read another source file from disk
4. Match an ID column from the source file to an ID column in the lookup cache
5. If the ID matches an ID in the lookup cache, write it to a match file, if the ID does not match write it to another file
Everything works from beginning to end. However, I am getting a duplicate in the match file. I have begun to suspect that the duplicate is caused by an end-of-file (or other) special character from the lookup cache text files when they are joined. These files are produced on a UNIX system (but I am encoding them to UTF8 when I read them). The duplicate record is the same record every time. How do I keep from getting the duplicate (or figure out where the duplicate is coming from)? I cannot use a remove duplicates, because there are legitimate duplicates in the destination. I have been trying to figure this out for a few weeks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先将数据放入临时表(您可以查询的表)中。也许您可以看到如何在连接在一起时获得重复。另外,如果您有有效的选项,您如何知道这是无效的重复?是什么让它无效?
Start with putting the data to staging tables, tables that you can query. Maybe you can see how in join ing together you get the duplication. Also, how do you know this is an invalid duplication if you have valid opnes? What makes it invalid?
我想通了这个问题。在读取源代码时,我没有将字段设置为空字符串,这会消除该行。然后,该行将与查找转换中的随机行进行匹配,并继续执行并写入目标。
I figured out the issue. I did not set a field to an empty string when reading the source which would have eliminated that row. Then that row was being matched to a random row in the lookup transform and continuing through and being written to the destination.