如何用AWK删除部分重复行?
我有包含此类重复行的文件,其中只有最后一个字段不同:
OST,0202000070,01-AUG-09,002735,6,0,0202000068,4520688,-1,0,0,0,0,0,55
ONE,0208076826,01-AUG-09,002332,316,3481.055935,0204330827,29150,200,0,0,0,0,0,5
ONE,0208076826,01-AUG-09,002332,316,3481.055935,0204330827,29150,200,0,0,0,0,0,55
OST,0202000068,01-AUG-09,003019,6,0,0202000071,4520690,-1,0,0,0,0,0,55
我需要删除该行的第一次出现并保留第二个。
我已经尝试过:
awk '!x[$0]++ {getline; print $0}' file.csv
但它没有按预期工作,因为它还删除了非重复行。
I have files with these kind of duplicate lines, where only the last field is different:
OST,0202000070,01-AUG-09,002735,6,0,0202000068,4520688,-1,0,0,0,0,0,55
ONE,0208076826,01-AUG-09,002332,316,3481.055935,0204330827,29150,200,0,0,0,0,0,5
ONE,0208076826,01-AUG-09,002332,316,3481.055935,0204330827,29150,200,0,0,0,0,0,55
OST,0202000068,01-AUG-09,003019,6,0,0202000071,4520690,-1,0,0,0,0,0,55
I need to remove the first occurrence of the line and leave the second one.
I've tried:
awk '!x[$0]++ {getline; print $0}' file.csv
but it's not working as intended, as it's also removing non duplicate lines.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您的近似重复项始终相邻,您只需与前一个条目进行比较即可避免创建潜在的巨大关联数组。
编辑:更改了脚本,以便打印一组近似重复项中的最后一个(不需要
tac
)。If your near-duplicates are always adjacent, you can just compare to the previous entry and avoid creating a potentially huge associative array.
Edit: Changed the script so it prints the last one in a group of near-duplicates (no
tac
needed).作为一般策略(尽管我上过 Aho 课程,但我并不是 AWK 专业人士),您可以尝试:
最后一个。
到一个哈希值。
循环打印哈希值
价值观。
这不是 AWK 特定的,我无法轻松提供任何示例代码,但这是我首先尝试的。
As a general strategy (I'm not much of an AWK pro despite taking classes with Aho) you might try:
the last.
to a hash.
loop through the hash printing out
the values.
This isn't AWK specific and I can't easily provide any sample code, but this is what I would first try.