比较两个文件

发布于 2024-11-16 10:44:21 字数 1666 浏览 1 评论 0原文

我有两个不同行的文件:

文件 1:

31.32   29.15   46.77   106.40  11370
25.81   40.82   25.67   30.08   16365
27.11   42.32   14.48   50.04   18310.7
26.48   42.34   12.65   62.78   19607.5
24.48   46.00   17.16   11.86   22087.2
26.75   43.91   29.65   55.81   24032.7
30.91   34.85   15.25   50.93   26703
25.24   41.62   16.54   51.57   38032.9
23.48   41.97   17.33   50.88   48981.2
24.16   39.34   16.99   50.86   77513.4
22.90   41.59   19.76   50.31   135803
19.98   43.52   20.58   45.65   747049
19.96   43.64   20.43   45.37   809913
19.93   43.75   20.41   45.33   863931

和文件 2:

12.4   -32.1    39.1    -44.9   135497.688
8.6    -38.6    39.3    -44.8   48981.191
1.0    -45.0    0.0     -54.0   45928.445
13.9   -70.1    39.4    -44.8   26702.982

我想比较这两个文件和输出:

文件 3

13.9  -70.1   30.91  34.85   39.4   -44.8   15.25   50.93   26702.982
8.6   -38.6   23.48  41.97   39.3   -44.8   17.33   50.88   48981.191

问题是两个文件中各自的列值不完全匹配。但如果它们在一定的误差范围内(例如,+/- 1)匹配,那就没问题了。


使用 F/R/C 表示文件/行/列来注释文件 3 中的值的来源:

13.9  -70.1   30.91  34.85   39.4   -44.8   15.25   50.93   26702.982
2/4/1 2/4/2   1/7/1  1/7/2   2/4/3  2/4/4   1/7/3   1/7/4   2/4/5

8.6   -38.6   23.48  41.97   39.3   -44.8   17.33   50.88   48981.191
2/2/1 2/2/2   1/9/1  1/9/2   2/2/3  2/2/4   1/9/3   1/9/4   2/2/5

但是:

  • 为什么文件 2 中的其他行没有条目?
  • 为什么文件 1 中没有其他行的条目?
  • 文件 1 第 7 行中的值与文件 2 第 4 行中的值有何关系?
  • 文件 1 第 9 行中的值与文件 2 第 2 行中的值有何关系?
  • 为什么文件 1 的第 5 列不在输出中?
  • 文件 1 的第 5 列是否与文件 2 的第 5 列连接,公差为 ±1?

I have two files which are different row :

file 1:

31.32   29.15   46.77   106.40  11370
25.81   40.82   25.67   30.08   16365
27.11   42.32   14.48   50.04   18310.7
26.48   42.34   12.65   62.78   19607.5
24.48   46.00   17.16   11.86   22087.2
26.75   43.91   29.65   55.81   24032.7
30.91   34.85   15.25   50.93   26703
25.24   41.62   16.54   51.57   38032.9
23.48   41.97   17.33   50.88   48981.2
24.16   39.34   16.99   50.86   77513.4
22.90   41.59   19.76   50.31   135803
19.98   43.52   20.58   45.65   747049
19.96   43.64   20.43   45.37   809913
19.93   43.75   20.41   45.33   863931

and file 2:

12.4   -32.1    39.1    -44.9   135497.688
8.6    -38.6    39.3    -44.8   48981.191
1.0    -45.0    0.0     -54.0   45928.445
13.9   -70.1    39.4    -44.8   26702.982

I would like to compare these two files and the output :

file 3

13.9  -70.1   30.91  34.85   39.4   -44.8   15.25   50.93   26702.982
8.6   -38.6   23.48  41.97   39.3   -44.8   17.33   50.88   48981.191

The problem is the respective columns value in the two files are not exactly matched. But It will be fine if they match within certain error bounds (e.g., +/- 1).


Annotating where values in file 3 come from, using F/R/C for File/Row/Column:

13.9  -70.1   30.91  34.85   39.4   -44.8   15.25   50.93   26702.982
2/4/1 2/4/2   1/7/1  1/7/2   2/4/3  2/4/4   1/7/3   1/7/4   2/4/5

8.6   -38.6   23.48  41.97   39.3   -44.8   17.33   50.88   48981.191
2/2/1 2/2/2   1/9/1  1/9/2   2/2/3  2/2/4   1/9/3   1/9/4   2/2/5

But:

  • Why are there no entries for the other lines in File 2?
  • Why are there no entries for the other lines in File 1?
  • How are the values in File 1 Row 7 related to the values in File 2 Row 4?
  • How are the values in File 1 Row 9 related to the values in File 2 Row 2?
  • Why is Column 5 of File 1 not in the output?
  • Is File 1 Column 5 joined with Column 5 of File 2, with a tolerance of ±1?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

亣腦蒛氧 2024-11-23 10:44:21

这:

(LC_ALL=C; join -1 5 -2 5 \
    <(<file1 awk '{printf "%s %s %s %s %d\n",$1,$2,$3,$4,int($5+0.5);}' | sort -nk5)\
    <(<file2 awk '{printf "%s %s %s %s %d\n",$1,$2,$3,$4,int($5+0.5);}' | sort -nk5)
) | awk '{print $2, $3, $6, $7, $4, $5, $8, $9, $1}'

将为您的输入生成:

13.9 -70.1 30.91 34.85 39.4 -44.8 15.25 50.93 26703
8.6 -38.6 23.48 41.97 39.3 -44.8 17.33 50.88 48981

最后一列被舍入。

更紧凑的形式:

cmd() {
    awk '{printf "%s %s %s %s %d\n",$1,$2,$3,$4,int($5+0.5);}' | sort -nk5
}
(LC_ALL=C; join -1 5 -2 5 <(<file1 cmd) <(<file2 cmd)) |\
awk '{print $2, $3, $6, $7, $4, $5, $8, $9, $1}'

This:

(LC_ALL=C; join -1 5 -2 5 \
    <(<file1 awk '{printf "%s %s %s %s %d\n",$1,$2,$3,$4,int($5+0.5);}' | sort -nk5)\
    <(<file2 awk '{printf "%s %s %s %s %d\n",$1,$2,$3,$4,int($5+0.5);}' | sort -nk5)
) | awk '{print $2, $3, $6, $7, $4, $5, $8, $9, $1}'

will produce for your input this:

13.9 -70.1 30.91 34.85 39.4 -44.8 15.25 50.93 26703
8.6 -38.6 23.48 41.97 39.3 -44.8 17.33 50.88 48981

The last column is rounded.

more compact form:

cmd() {
    awk '{printf "%s %s %s %s %d\n",$1,$2,$3,$4,int($5+0.5);}' | sort -nk5
}
(LC_ALL=C; join -1 5 -2 5 <(<file1 cmd) <(<file2 cmd)) |\
awk '{print $2, $3, $6, $7, $4, $5, $8, $9, $1}'
烟酒忠诚 2024-11-23 10:44:21
awk '
    function close_enough(v1, v2, delta) {
        delta = v1 - v2
        return (-1 <= delta && delta <= 1)
    }
    NR == FNR {
        key[$NF] = $0
        next
    }
    {
        for (val in key) {
            if (close_enough($NF,val)) {
                split(key[val], arr)
                print arr[1], arr[2], $1, $2, arr[3], arr[4], $3, $4, val
            }
        }
    }
' file2 file1 | column -t > file3
awk '
    function close_enough(v1, v2, delta) {
        delta = v1 - v2
        return (-1 <= delta && delta <= 1)
    }
    NR == FNR {
        key[$NF] = $0
        next
    }
    {
        for (val in key) {
            if (close_enough($NF,val)) {
                split(key[val], arr)
                print arr[1], arr[2], $1, $2, arr[3], arr[4], $3, $4, val
            }
        }
    }
' file2 file1 | column -t > file3
疏忽 2024-11-23 10:44:21

只能用 awk。

awk '
NR==FNR {a[int($5+0.5)] = $0; next}
a[int($5+0.5)] {$0 = a[int($5+0.5)] " " $0; print $6,$7,$1,$2,$8,$9,$10}' file1 file2

如果需要对其进行排序,请将输出通过管道传输到 sort

Only with awk.

awk '
NR==FNR {a[int($5+0.5)] = $0; next}
a[int($5+0.5)] {$0 = a[int($5+0.5)] " " $0; print $6,$7,$1,$2,$8,$9,$10}' file1 file2

If you need it to be sorted, pipe the output into sort

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文