我想知道两个文件之间的差异。想要使用 Unix 脚本查找具体出现在哪一列的差异

发布于 2024-12-24 21:17:57 字数 3157 浏览 9 评论 0原文

我们正在做一个从Informix迁移到Oracle的数据迁移项目。现在有一个需求,需要比较Informix和oracle表之间的数据。因此 Informix 中的 Table1 将迁移到 Oracle 中的 Table1。
现在数据已从 Informix 中的 Table1 迁移到 Oracle 中的 Table1。我正在将两个表中的数据放入两个平面文件中。我只想看看两个文件中是否有任何数据差异。

File1 假设有 100 万条记录,300 列,File2 假设有 100 万条记录,300 列。 File1File2 是管道分隔的文件。所以差异可以在文件中的任何地方。几乎整行都可以匹配,但一列/两列可能存在差异。因此,我想找出其中的差异,同时牢记上述所有内容。差异输出应列出类似 row1 in file1 is not match with row1 in file2 at column 3 以及两个文件中的列的值。这两个文件的所有这些差异都需要以外行可以理解的格式在控制台上列出。

我想用一个例子来解释它,以使其更清楚。

我的第一个文件如下所示:

col1|col2|col3|col4|col5|col6
1234|ramyakrishna|4567|[email protected]|228802|worse
1235|Chandan|4567|[email protected]|228862|worse
1236|Kacitha|4567|[email protected]|228872|worse
1238|Shajin|4567|[email protected]|228873|worse

我的第二个文件如下所示:

col1|col2|col3|col4|col5|col6
1234|ramyakrishna|4567|[email protected]|228802|good
1235|Chandan|4567|[email protected]|228789|worse
1236|Kacitha|4567|[email protected]|228872|worse

所以当我比较这两个文件时。我们看到 row2 中的 col4col6 存在差异。所以我想要这样的东西:

1234|ramyakrishna|4567|[email protected]|228802|good
:[email protected]:COL6-EXPECTED-worse

所以在上面的行中应该打印第二个文件。然后应该打印预期值。

我知道在使用任何脚本之前需要对文件进行排序。


对于 fge 的脚本,我得到如下所示的输出:

我得到如下所示的输出

$perl diff.perl    

Line 1: different value for column 38 (was g, expected w)
Line 1: different value for column 40 (was o, expected r)
Line 1: different value for column 41 (was d, expected s)
Line 1: different value for column 42 (was ., expected e)
Line 1: different value for column 43 (was c, expected .)
Line 1: different value for column 44 (was o, expected c)
Line 1: different value for column 45 (was m, expected o)
Line 1: different value for column 46 (was |, expected m)

,我想要整列比较。文件是 | 分隔的文件。

We are doing a data migration project migrating from Informix to Oracle. Now a requirement has come up to compare the data between Informix and oracle tables. So Table1 from Informix will be migrated to Table1 in Oracle.
Now data is migrated from Table1 in Informix to Table1 in Oracle. I am taking data from both the tables into 2 flat files.I want to just see if there is any data difference in 2 files.

File1 has say 1 million records with 300 columns and File2 say 1 million records with 300 columns. File1 and File2 are pipe delimited files. So difference can be anywhere in the file. Almost a whole row can match, but there can be difference in one/two columns. So I would like to find out the differences, keeping all the above things in mind. Difference output should list out something like row1 in file1 is not matching with row1 in file2 at column 3 and the value of the columns from both files. All such differences from both the files need to be listed out on the console with layman understandable format.

I want to just explain it with an example to make it more clear.

My first file if it looks like below:

col1|col2|col3|col4|col5|col6
1234|ramyakrishna|4567|[email protected]|228802|worse
1235|Chandan|4567|[email protected]|228862|worse
1236|Kacitha|4567|[email protected]|228872|worse
1238|Shajin|4567|[email protected]|228873|worse

My second file looks like this:

col1|col2|col3|col4|col5|col6
1234|ramyakrishna|4567|[email protected]|228802|good
1235|Chandan|4567|[email protected]|228789|worse
1236|Kacitha|4567|[email protected]|228872|worse

So when i compare both the files. We see that there is a difference in col4 and col6 in row2. So I want something like this:

1234|ramyakrishna|4567|[email protected]|228802|good
:[email protected]:COL6-EXPECTED-worse

So in above row from second file should be printed. Then expected values should be printed.

I know files need to be sorted before using any script.


For fge's script i am getting ouput like below:

I am getting ouput like below

$perl diff.perl    

Line 1: different value for column 38 (was g, expected w)
Line 1: different value for column 40 (was o, expected r)
Line 1: different value for column 41 (was d, expected s)
Line 1: different value for column 42 (was ., expected e)
Line 1: different value for column 43 (was c, expected .)
Line 1: different value for column 44 (was o, expected c)
Line 1: different value for column 45 (was m, expected o)
Line 1: different value for column 46 (was |, expected m)

I want whole column comparison. Files are | delimited files.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

暗恋未遂 2024-12-31 21:17:57

假设文件按相同顺序排序,请查看 commdiff。请注意,这在线路级别上运行。要深入了解字段级别的差异,您可以从行级别不同的子集群体开始。

Assuming the files are sorted in the same order, look at comm or diff. Be advised that this operates on the line-level. To drill down to differences at the field level you might begin with the subset population of those differing a the line-level.

暖伴 2024-12-31 21:17:57

像这样的事情(在 perl 中)可能会这样做——它假设文件具有相同的行数,但这可以使用 wc 轻松检查,并且没有行是空的:

#!/usr/bin/perl -W
use strict;

open FILE1, "file1" or die;
open FILE2, "file2" or die;

my (@cols1, @cols2);
my ($val1, $val2);
my $linenr = 0;

while (my $line = <FILE1>) {
    @cols1 = split('|', $line);
    @cols2 = split('|', <FILE2>);
    $linenr++;

    for (my $i = 0; $i <= $#cols1; $i++) {
        $val1 = $cols1[$i]; $val2 = $cols2[$i];
        if ("$val1" ne "$val2") {
            printf("Line %d: different value for column %d (was %s, expected %s)\n",
                $linenr, $i+1, $val2, $val1);
    }
}

Something like this (in perl) may do -- it assumes that the files have the same number of lines, but this can be easily checked with wc, and that no line is empty:

#!/usr/bin/perl -W
use strict;

open FILE1, "file1" or die;
open FILE2, "file2" or die;

my (@cols1, @cols2);
my ($val1, $val2);
my $linenr = 0;

while (my $line = <FILE1>) {
    @cols1 = split('|', $line);
    @cols2 = split('|', <FILE2>);
    $linenr++;

    for (my $i = 0; $i <= $#cols1; $i++) {
        $val1 = $cols1[$i]; $val2 = $cols2[$i];
        if ("$val1" ne "$val2") {
            printf("Line %d: different value for column %d (was %s, expected %s)\n",
                $linenr, $i+1, $val2, $val1);
    }
}
踏雪无痕 2024-12-31 21:17:57

我推荐 WinMerge 来比较两个大文件,因为它速度很快。不幸的是,它仅在 Windows 中可用(很快将在 Linux 中可用)。

由于您的平面文件来自数据库,因此您可以在转储表时轻松对其进行排序。然后使用winmerge来比较文件。

希望这有帮助。

I recommend WinMerge to compare two big files because its fast. Unfortunately, its only available in Windows (will be available in linux soon).

Since your flat files are coming from the database, you can easily sort it when you dump the tables. Then use winmerge to compare the files.

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文