当前位置：文江博客话题详情

Bash sorting uniq

如何删除文件中的重复行？

发布于 2025-01-27 19:33:43 字数 145 浏览 3 评论 0 原文

我知道一般方法是使用类似的方法

$ sort file1.txt | uniq > file2.txt

，但我想知道是否有一种方法可以在不需要单独的源和目标文件的情况下进行此操作，即使这意味着它不能是单线。

原文

I understand that the general approach is to use something like

$ sort file1.txt | uniq > file2.txt

But I was wondering if there was a way to do this without needing separate source and destination files, even if it means it can't be a one-liner.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

想挽留 2025-02-03 19:33:43

只需使用 -O 和 -U sort> sort 的选项：

sort -o file -u file

您甚至不需要将管道用于另一个命令，例如<代码> uniq 。

Simply use the -o and -u options of sort:

sort -o file -u file

You don't need even to use a pipe for another command, such as uniq.

回复收藏 0 原文

凡间太子 2025-02-03 19:33:43

使用GNU AWK进行“ Inplace”编辑：

awk -i inplace '!seen[$0]++' file1.txt

与所有工具一样（ ED ，要求首先读取整个文件）支持“ Intherplope”编辑（ sed -i ， perl -i ， ruby -i 等）这使用了场景后面的临时文件。

对于任何尴尬，您可以在没有使用的温度文件的情况下进行以下操作，但大约是使用的内存的两倍：

awk '!seen[$0]++{a[++n]=$0} END{for (i=1;i<=n;i++) print a[i] > FILENAME}' file

With GNU awk for "inplace" editing:

awk -i inplace '!seen[$0]++' file1.txt

As with all tools (except ed which requires the whole file to be read into memory first) that support "inplace" editing (sed -i, perl -i, ruby -i, etc.) this uses a temp file behind the scenes.

With any awk you can do the following with no temp files used but about twice the memory used instead:

awk '!seen[$0]++{a[++n]=$0} END{for (i=1;i<=n;i++) print a[i] > FILENAME}' file

回复收藏 0 原文

岁月无声 2025-02-03 19:33:43

使用Perl的 -i ：

perl -i -lne 'print unless $seen{$_}++' original.file

-i 更改文件“适当”；
-n 按行读取输入，为每行的代码运行；
-l </code>从输入中删除新线，并将它们添加到 print ;
哈希成语在。

With Perl's -i:

perl -i -lne 'print unless $seen{$_}++' original.file

-i changes the file "in place";
-n reads the input line by line, running the code for each line;
-l removes newlines from input and adds them to print;
The %seen hash idiom is described in perlfaq4.

回复收藏 0 原文

二智少女 2025-02-03 19:33:43

一个常见的成语是：

temp=$(mktemp)
some_pipeline < original.file > "$temp" && mv "$temp" original.file

＆amp;＆amp; 很重要：如果管道失败，那么原始文件就不会被（也许）垃圾覆盖。

Linux moreutils 包包含一个将其封装的程序：

some_pipeline < original.file | sponge original.file

A common idiom is:

temp=$(mktemp)
some_pipeline < original.file > "$temp" && mv "$temp" original.file

The && is important: if the pipeline fails, then the original file won't be overwritten with (perhaps) garbage.

The Linux moreutils package contains a program that encapsulates this away:

some_pipeline < original.file | sponge original.file

回复收藏 0 原文

~没有更多了~

关于作者

故事未完

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

如何删除文件中的重复行？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

如何删除文件中的重复行？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。