我想知道两个文件之间的差异。想要使用 Unix 脚本查找具体出现在哪一列的差异
我们正在做一个从Informix迁移到Oracle的数据迁移项目。现在有一个需求,需要比较Informix和oracle表之间的数据。因此 Informix 中的 Table1 将迁移到 Oracle 中的 Table1。
现在数据已从 Informix 中的 Table1
迁移到 Oracle 中的 Table1
。我正在将两个表中的数据放入两个平面文件中。我只想看看两个文件中是否有任何数据差异。
File1
假设有 100 万条记录,300 列,File2
假设有 100 万条记录,300 列。 File1
和 File2
是管道分隔的文件。所以差异可以在文件中的任何地方。几乎整行都可以匹配,但一列/两列可能存在差异。因此,我想找出其中的差异,同时牢记上述所有内容。差异输出应列出类似 row1 in file1 is not match with row1 in file2 at column 3
以及两个文件中的列的值。这两个文件的所有这些差异都需要以外行可以理解的格式在控制台上列出。
我想用一个例子来解释它,以使其更清楚。
我的第一个文件如下所示:
col1|col2|col3|col4|col5|col6 1234|ramyakrishna|4567|[email protected]|228802|worse 1235|Chandan|4567|[email protected]|228862|worse 1236|Kacitha|4567|[email protected]|228872|worse 1238|Shajin|4567|[email protected]|228873|worse
我的第二个文件如下所示:
col1|col2|col3|col4|col5|col6 1234|ramyakrishna|4567|[email protected]|228802|good 1235|Chandan|4567|[email protected]|228789|worse 1236|Kacitha|4567|[email protected]|228872|worse
所以当我比较这两个文件时。我们看到 row2
中的 col4
和 col6
存在差异。所以我想要这样的东西:
1234|ramyakrishna|4567|[email protected]|228802|good :[email protected]:COL6-EXPECTED-worse
所以在上面的行中应该打印第二个文件。然后应该打印预期值。
我知道在使用任何脚本之前需要对文件进行排序。
对于 fge 的脚本,我得到如下所示的输出:
我得到如下所示的输出
$perl diff.perl Line 1: different value for column 38 (was g, expected w) Line 1: different value for column 40 (was o, expected r) Line 1: different value for column 41 (was d, expected s) Line 1: different value for column 42 (was ., expected e) Line 1: different value for column 43 (was c, expected .) Line 1: different value for column 44 (was o, expected c) Line 1: different value for column 45 (was m, expected o) Line 1: different value for column 46 (was |, expected m)
,我想要整列比较。文件是 |
分隔的文件。
We are doing a data migration project migrating from Informix to Oracle. Now a requirement has come up to compare the data between Informix and oracle tables. So Table1 from Informix will be migrated to Table1 in Oracle.
Now data is migrated from Table1
in Informix to Table1
in Oracle. I am taking data from both the tables into 2 flat files.I want to just see if there is any data difference in 2 files.
File1
has say 1 million records with 300 columns and File2
say 1 million records with 300 columns. File1
and File2
are pipe delimited files. So difference can be anywhere in the file. Almost a whole row can match, but there can be difference in one/two columns. So I would like to find out the differences, keeping all the above things in mind. Difference output should list out something like row1 in file1 is not matching with row1 in file2 at column 3
and the value of the columns from both files. All such differences from both the files need to be listed out on the console with layman understandable format.
I want to just explain it with an example to make it more clear.
My first file if it looks like below:
col1|col2|col3|col4|col5|col6 1234|ramyakrishna|4567|[email protected]|228802|worse 1235|Chandan|4567|[email protected]|228862|worse 1236|Kacitha|4567|[email protected]|228872|worse 1238|Shajin|4567|[email protected]|228873|worse
My second file looks like this:
col1|col2|col3|col4|col5|col6 1234|ramyakrishna|4567|[email protected]|228802|good 1235|Chandan|4567|[email protected]|228789|worse 1236|Kacitha|4567|[email protected]|228872|worse
So when i compare both the files. We see that there is a difference in col4
and col6
in row2
. So I want something like this:
1234|ramyakrishna|4567|[email protected]|228802|good :[email protected]:COL6-EXPECTED-worse
So in above row from second file should be printed. Then expected values should be printed.
I know files need to be sorted before using any script.
For fge's script i am getting ouput like below:
I am getting ouput like below
$perl diff.perl Line 1: different value for column 38 (was g, expected w) Line 1: different value for column 40 (was o, expected r) Line 1: different value for column 41 (was d, expected s) Line 1: different value for column 42 (was ., expected e) Line 1: different value for column 43 (was c, expected .) Line 1: different value for column 44 (was o, expected c) Line 1: different value for column 45 (was m, expected o) Line 1: different value for column 46 (was |, expected m)
I want whole column comparison. Files are |
delimited files.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
假设文件按相同顺序排序,请查看
comm
或diff
。请注意,这在线路级别上运行。要深入了解字段级别的差异,您可以从行级别不同的子集群体开始。Assuming the files are sorted in the same order, look at
comm
ordiff
. Be advised that this operates on the line-level. To drill down to differences at the field level you might begin with the subset population of those differing a the line-level.像这样的事情(在 perl 中)可能会这样做——它假设文件具有相同的行数,但这可以使用
wc
轻松检查,并且没有行是空的:Something like this (in perl) may do -- it assumes that the files have the same number of lines, but this can be easily checked with
wc
, and that no line is empty:我推荐 WinMerge 来比较两个大文件,因为它速度很快。不幸的是,它仅在 Windows 中可用(很快将在 Linux 中可用)。
由于您的平面文件来自数据库,因此您可以在转储表时轻松对其进行排序。然后使用winmerge来比较文件。
希望这有帮助。
I recommend WinMerge to compare two big files because its fast. Unfortunately, its only available in Windows (will be available in linux soon).
Since your flat files are coming from the database, you can easily sort it when you dump the tables. Then use winmerge to compare the files.
Hope this helps.