如何在DataStage中的源文件和目标表之间进行复制记录检查
我想进行两种重复检查,
- 以前是否已经加载了该名称的文件。
例如,将文件a加载到目标表中,然后运行后续运行,如果我们收到文件A,则该时间序列应因已经加载而中止。
- 如果我们已经加载了与相同记录的加载a
,则文件A已经在目标表中,下次我们在该文件B中接收文件B时工作应该流产,
谁能帮助我解决这种情况?
谢谢 Venkat。
I want to do two types of duplicate checking
- If we already have loaded A file With That name previously.
For instance, file A is loaded into the target table, and subsequent run, if we receive the file A, this time sequence should be aborted because it's already loaded.
- If we have already loaded a with the identical records
For instance, file A is already in the target table, and next time we receive file B in this file B, those already loaded in the target with file A should not be loaded, and the job should be aborted
Can anyone help me with this scenario?
Thanks
Venkat.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要保留已加载哪些文件名的记录,通常是通过将文件移至存档(或“处理过”)目录的记录。因此,您可以使用此文件名使用简单的 ls 命令来确定是否存在,以求解您的第一个要求。
确定文件B是否具有相同的记录来归档A是一个更复杂的问题。您可以使用 diff 命令吗?否则,您可能需要做一些聪明的事情。即使在此之前,您如何确定该文件是您必须比较的文件?如果有钥匙值,则可以对目标表进行检查。
You need to keep records of which file names have been loaded, typically by having moved the file to an archive (or "processed") directory. So you can use a simple ls command with this file name to determine whether it exists, to solve your first requirement.
Determining whether file B has identical records to file A is a more complex question. Can you use a diff command? Otherwise you may need to do something cleverer. Even before that, how do you establish that file A is the one against which you have to compare? If there are key values, you may be able to check against the target table.