当前位置：文江博客话题详情

Perl GREP AWK sed unique

有哪些单行代码可以将第 n 列的唯一元素输出到另一个文件？

发布于 2024-08-02 09:41:05 字数 325 浏览 5 评论 0原文

我有一个这样的文件：

有哪些单行代码可以将第 n 列的唯一元素输出到另一个文件？

编辑：这是人们给出的解决方案列表。谢谢你们！

cat in.txt | cut -d' ' -f 3 | sort -u
cut -c 1 t.txt | sort -u
awk '{ print $2 }' cols.txt | uniq
perl -anE 'say $F[0] unless $h{$F[0]}++' filename

I have a file like this:

What are some one-liners that can output unique elements of the nth column to another file?

EDIT: Here's a list of solutions people gave. Thanks guys!

cat in.txt | cut -d' ' -f 3 | sort -u
cut -c 1 t.txt | sort -u
awk '{ print $2 }' cols.txt | uniq
perl -anE 'say $F[0] unless $h{$F[0]}++' filename

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（5）

风轻花落早 2024-08-09 09:41:05

在 5.10 之前的 Perl 中

perl -lane 'print $F[0] unless $h{$F[0]}++' filename

在 5.10 之后的 Perl 中

perl -anE 'say $F[0] unless $h{$F[0]}++' filename

将 0 替换为要输出的列。

对于 j_random_hacker，这里是一个使用很少内存的实现（但速度较慢并且需要更多打字）：

perl -lane 'BEGIN {dbmopen %h, "/tmp/$", 0600; unlink "/tmp/$.db" } print $F[0] unless $h{$F[0]}++' filename

dbmopen 在 DBM 文件之间创建一个接口（它创建或打开）和名为 %h 的哈希。 %h 中存储的任何内容都将存储在光盘上而不是内存中。使用 unlink 删除文件可确保程序完成后文件不会保留，但对当前进程没有影响（因为根据 POSIX 规则，文件系统将打开的文件句柄视为真实文件）。

In Perl before 5.10

perl -lane 'print $F[0] unless $h{$F[0]}++' filename

In Perl after 5.10

perl -anE 'say $F[0] unless $h{$F[0]}++' filename

Replace 0 with the column you want to output.

For j_random_hacker, here is an implementation that will use very little memory (but will be a slower and requires more typing):

perl -lane 'BEGIN {dbmopen %h, "/tmp/$", 0600; unlink "/tmp/$.db" } print $F[0] unless $h{$F[0]}++' filename

dbmopen creates an interface between a DBM file (that it creates or opens) and the hash named %h. Anything stored in %h will be stored on disc instead of in memory. Deleting the file with unlink ensures that the file will not stick around after the program is done, but has no effect on the current process (since, according to POSIX rules, open filehandles are respected by the filesystem as real files).

回复收藏 0 原文

单身狗的梦 2024-08-09 09:41:05

更正：谢谢马克·鲁沙科夫。

$ cut -c 1 t.txt | sort | uniq

或者

$ cut -c 1 t.txt | sort -u


1
4
7
9

Corrected: Thank you Mark Rushakoff.

$ cut -c 1 t.txt | sort | uniq

or

$ cut -c 1 t.txt | sort -u


1
4
7
9

回复收藏 0 原文

迎风吟唱 2024-08-09 09:41:05

取第三列的唯一值：

$ cat in.txt | cut -d' ' -f 3 | sort -u
3
4
6
8

cut -d' ' 表示以空格分隔输入，-f 3 部分表示取第三个字段。最后，sort -u 对输出进行排序，仅保留唯一的条目。

Taking the unique values of the third column:

$ cat in.txt | cut -d' ' -f 3 | sort -u
3
4
6
8

cut -d' ' means to separate the input delimited by spaces, and the -f 3 part means take the third field. Finally, sort -u sorts the output, keeping only unique entries.

回复收藏 0 原文

甜心小果奶 2024-08-09 09:41:05

假设您的文件是“cols.txt”，并且您想要第二列的唯一元素：

awk '{ print $2 }' cols.txt | uniq

您可能会发现以下文章对于了解有关此类实用程序的更多信息很有用：

使用 Linux 文本实用程序简化数据提取

Say your file is "cols.txt" and you want the unique elements of the second column:

awk '{ print $2 }' cols.txt | uniq

You might find the following article useful for learning more about such utilities:

Simplify data extraction using Linux text utilities

回复收藏 0 原文

少女七分熟 2024-08-09 09:41:05

如果使用awk，则无需使用其他命令

awk '!_[$2]++{print $2}' file

if using awk, no need to use other commands

awk '!_[$2]++{print $2}' file

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

0 文章

0 评论

24 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

留蓝

文章 0 评论 0

18790681156

文章 0 评论 0

zach7772

文章 0 评论 0

Wini

文章 0 评论 0

ayeshaaroy

文章 0 评论 0

初雪

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文