比较 2 个相似的文件并仅输出差异,保留它们出现的顺序?

发布于 2024-12-22 07:18:32 字数 836 浏览 1 评论 0原文

希望有人能帮助我解决这个问题

我有 2 个文件,一个是 325 行长,一个是 361 行长。

这些文件大部分内容相同,但第二个文件插入了随机的额外行。我只对额外的行感兴趣,并且需要保留它们在文件中出现的顺序。

这些文件包含大约 31 行的重复段落 - 我知道该段落的第一行和最后一行,并且删除整个段落没有问题,但不知道如何删除。

即 File1

The quick brown
fox jumped 
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog

即 File2

The quick brown
fox jumped
over the
lazy dog
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
The quick brown
fox jumped
over the
lazy dog
djakdjhgmv
asdjkljkgfyiyi
The quick brown
fox jumped
over the
lazy dog
jghytpuptou

我只需要按以下顺序输出额外的行:

sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
djakdjhgmv
asdjkljkgfyiyi
jghytpuptou

任何帮助或建议将不胜感激,不幸的是我不是 *nix 人:( 我尝试了一些 diff 表达式和 comm 表达式,但无法得到我需要的东西。

hoping someone can help me get my head around this

I have 2 files, one is 325 lines long, one is 361 lines long.

The bulk of these files is identical content but the 2nd one has random extra lines inserted. I am only interested in the extra lines, and I need to preserve the order in which they occur in the file.

The files contain a repeating paragraph of approximately 31 lines - I know the first and last line of this paragraph, and have no problems with dropping the entire paragraph, but can't work out how.

i.e. File1

The quick brown
fox jumped 
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog

i.e. File2

The quick brown
fox jumped
over the
lazy dog
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
The quick brown
fox jumped
over the
lazy dog
djakdjhgmv
asdjkljkgfyiyi
The quick brown
fox jumped
over the
lazy dog
jghytpuptou

I need to output only the extra lines in this order:

sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
djakdjhgmv
asdjkljkgfyiyi
jghytpuptou

Any help or advice would be gratefully received, I am not a *nix person unfortunately :(
I tried a few diff expressions and comm expressions, but can't get what I need.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

对不⑦ 2024-12-29 07:18:32

尝试这个神奇的命令:

diff file1.txt file2.txt | sed -n 's/^> \(.*\)/\1/p'

diff file1.txt file2.txt 应该输出类似

2c2
< fox jumped 
---
> fox jumped
4a5,7
> sadhasdgh
> qyyutrytkdaslksad
> utyiuiytiuyo
8a12,13
> djakdjhgmv
> asdjkljkgfyiyi
12a18
> jghytpuptou

sed -n 's/^>; \(.*\)/\1/p' 应该找到以 > 开头的行,并输出不带 > 的行。这不起作用的可能原因是您系统上 diff 的输出不同?

Try this magic command:

diff file1.txt file2.txt | sed -n 's/^> \(.*\)/\1/p'

diff file1.txt file2.txt should output something like

2c2
< fox jumped 
---
> fox jumped
4a5,7
> sadhasdgh
> qyyutrytkdaslksad
> utyiuiytiuyo
8a12,13
> djakdjhgmv
> asdjkljkgfyiyi
12a18
> jghytpuptou

sed -n 's/^> \(.*\)/\1/p' should find lines staring with > and output that lines without >. Possible reason why this doesn't work is different output of diff at your system?

吾家有女初长成 2024-12-29 07:18:32

这应该有效 -

awk 'NR==FNR{a[$0]++;next} !($0 in a){print $0}' file1 file2

解释:

NRFNRawk 的内置变量NR 注册记录数,并且在处理两个文件时不会重置为 0FNRNR 类似,但在文件完全解析后重置为 0

在此 awk 单行代码中,我们保留条件 NR==FNR,即强制执行操作 {a[$0]++;next} 仅在 file1 上(因为 NR==FNR 仅在我们使用 file1 之前才为真)。此操作将每一行存储在一个数组中。添加 next 以便不会调用第二个操作。一旦此 NR==FNR 变为 untrue,则永远不会调用第一个操作awk 转到第二个操作,即检查 file2 相对于数组 的内容(即文件1)。如果file2的内容在array中,我们忽略它。如果数组中不存在,我们将打印它,因为这些行将是额外的行,并且仅在file2中。

测试:

文件1:

[jaypal:~/Temp] cat file1
The quick brown
fox jumped 
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog

文件2:

[jaypal:~/Temp] cat file2
The quick brown
fox jumped
over the
lazy dog
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
The quick brown
fox jumped
over the
lazy dog
djakdjhgmv
asdjkljkgfyiyi
The quick brown
fox jumped
over the
lazy dog
jghytpuptou

执行:

[jaypal:~/Temp] awk 'NR==FNR{a[$0]++;next} !($0 in a){print $0}' file1 file2
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
djakdjhgmv
asdjkljkgfyiyi
jghytpuptou

This should work -

awk 'NR==FNR{a[$0]++;next} !($0 in a){print $0}' file1 file2

Explaination:

NR and FNR are awk's built-in variables. NR registers the number of records and does not get reset to 0 when working with two files. FNR is similar to NR but gets reset to 0 after the file is completely parsed through.

In this awk one-liner, we keep that condition NR==FNR which is to force action {a[$0]++;next} only on the file1 (as NR==FNR will only be true till we are working with file1). This action stores each line in an array. next is added so that the second action does not get called upon. Once this NR==FNR becomes untrue, the first action is never called. awk moves to the second action which is to check the content of the file2 with respect to the array (i.e file1). If the content of file2 is in the array, we ignore it. If it is not there in the array we print it as those lines would be the ones that are extra and only in file2.

Test:

File1:

[jaypal:~/Temp] cat file1
The quick brown
fox jumped 
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog
The quick brown
fox jumped
over the
lazy dog

File2:

[jaypal:~/Temp] cat file2
The quick brown
fox jumped
over the
lazy dog
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
The quick brown
fox jumped
over the
lazy dog
djakdjhgmv
asdjkljkgfyiyi
The quick brown
fox jumped
over the
lazy dog
jghytpuptou

Execution:

[jaypal:~/Temp] awk 'NR==FNR{a[$0]++;next} !($0 in a){print $0}' file1 file2
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
djakdjhgmv
asdjkljkgfyiyi
jghytpuptou
ˉ厌 2024-12-29 07:18:32

这可能对你有用(GNU diff):

diff -bu file1 file2 | sed -n '1,2d;s/^+//p'
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
djakdjhgmv
asdjkljkgfyiyi
jghytpuptou

This might work for you (GNU diff):

diff -bu file1 file2 | sed -n '1,2d;s/^+//p'
sadhasdgh
qyyutrytkdaslksad
utyiuiytiuyo
djakdjhgmv
asdjkljkgfyiyi
jghytpuptou
小霸王臭丫头 2024-12-29 07:18:32
diff -b sample.log sample.log.1 | awk '/>/ {print $2}'
diff -b sample.log sample.log.1 | awk '/>/ {print $2}'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文