查找文件中的重复行并计算每行重复的次数?

发布于 2024-11-24 04:33:44 字数 183 浏览 2 评论 0原文

假设我有一个类似于以下内容的文件:

123 
123 
234 
234 
123 
345

我想查找“123”重复了多少次,“234”重复了多少次等。 所以理想情况下,输出将是这样的:

123  3 
234  2 
345  1

Suppose I have a file similar to the following:

123 
123 
234 
234 
123 
345

I would like to find how many times '123' was duplicated, how many times '234' was duplicated, etc.
So ideally, the output would be like:

123  3 
234  2 
345  1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

满栀 2024-12-01 04:34:47

在Windows中,使用“Windows PowerShell”,我使用下面提到的命令来实现此目的

Get-Content .\file.txt | Group-Object | Select Name, Count

此外,我们可以使用where-object Cmdlet来过滤结果

Get-Content .\file.txt | Group-Object | Where-Object { $_.Count -gt 1 } | Select Name, Count

In Windows, using "Windows PowerShell", I used the command mentioned below to achieve this

Get-Content .\file.txt | Group-Object | Select Name, Count

Also, we can use the where-object Cmdlet to filter the result

Get-Content .\file.txt | Group-Object | Where-Object { $_.Count -gt 1 } | Select Name, Count
沩ん囻菔务 2024-12-01 04:34:47

假设您可以访问标准 Unix shell 和/或 cygwin 环境:

tr -s ' ' '\n' < yourfile | sort | uniq -d -c
       ^--space char

基本上:将所有空格字符转换为换行符,然后对翻译后的输出进行排序并将其提供给 uniq 并计算重复行。

Assuming you've got access to a standard Unix shell and/or cygwin environment:

tr -s ' ' '\n' < yourfile | sort | uniq -d -c
       ^--space char

Basically: convert all space characters to linebreaks, then sort the tranlsated output and feed that to uniq and count duplicate lines.

雅心素梦 2024-12-01 04:34:46

通过

awk '{dups[$1]++} END{for (num in dups) {print num,dups[num]}}' data

awk 'dups[ $1]++'命令,变量$1保存column1的全部内容,方括号是数组访问。因此,对于 data 文件中的每一行第一列,名为 dups 的数组的节点都会递增。

最后,我们以 num 作为变量循环遍历 dups 数组,并首先打印保存的数字,然后打印重复值的数量 <代码>dups[num]。

请注意,您的输入文件的某些行末尾有空格,如果清除这些空格,您可以在上面的命令中使用 $0 代替 $1 :)

Via :

awk '{dups[$1]++} END{for (num in dups) {print num,dups[num]}}' data

In awk 'dups[$1]++' command, the variable $1 holds the entire contents of column1 and square brackets are array access. So, for each 1st column of line in data file, the node of the array named dups is incremented.

And at the end, we are looping over dups array with num as variable and print the saved numbers first then their number of duplicated value by dups[num].

Note that your input file has spaces on end of some lines, if you clear up those, you can use $0 in place of $1 in command above :)

明媚如初 2024-12-01 04:34:46

要查找重复计数,请使用以下命令:

sort filename | uniq -c | awk '{print $2, $1}'

To find duplicate counts, use this command:

sort filename | uniq -c | awk '{print $2, $1}'
花开柳相依 2024-12-01 04:34:44

要查找并计算多个文件中的重复行,您可以尝试以下命令:

sort <files> | uniq -c | sort -nr

或者:

cat <files> | sort | uniq -c | sort -nr

To find and count duplicate lines in multiple files, you can try the following command:

sort <files> | uniq -c | sort -nr

or:

cat <files> | sort | uniq -c | sort -nr
故乡的云 2024-12-01 04:34:43

这将仅打印重复行,带有计数:

sort FILE | uniq -cd

或者,带有GNU长选项(在Linux上):

sort FILE | uniq --count --repeated

BSD上在 OSX 中,您必须使用 grep 来过滤掉唯一的行:

sort FILE | uniq -c | grep -v '^ *1 '

对于给定的示例,结果将是:

  3 123
  2 234

如果您想打印所有行的计数,包括那些只出现一次的行:

sort FILE | uniq -c

或者,与GNU 长选项(在 Linux 上):

sort FILE | uniq --count

对于给定的输入,输出为:

  3 123
  2 234
  1 345

为了将最常见的行放在顶部对输出进行排序,您可以执行以下操作(以获取所有结果):

sort FILE | uniq -c | sort -nr

或者,为了只获取重复的行,最常见的第一个:

sort FILE | uniq -cd | sort -nr

在 OSX 和 BSD 上,最后一个变为:

sort FILE | uniq -c | grep -v '^ *1 ' | sort -nr

This will print duplicate lines only, with counts:

sort FILE | uniq -cd

or, with GNU long options (on Linux):

sort FILE | uniq --count --repeated

on BSD and OSX you have to use grep to filter out unique lines:

sort FILE | uniq -c | grep -v '^ *1 '

For the given example, the result would be:

  3 123
  2 234

If you want to print counts for all lines including those that appear only once:

sort FILE | uniq -c

or, with GNU long options (on Linux):

sort FILE | uniq --count

For the given input, the output is:

  3 123
  2 234
  1 345

In order to sort the output with the most frequent lines on top, you can do the following (to get all results):

sort FILE | uniq -c | sort -nr

or, to get only duplicate lines, most frequent first:

sort FILE | uniq -cd | sort -nr

on OSX and BSD the final one becomes:

sort FILE | uniq -c | grep -v '^ *1 ' | sort -nr
心欲静而疯不止 2024-12-01 04:34:42

假设每行有一个数字:

sort <file> | uniq -c

您也可以在 GNU 版本中使用更详细的 --count 标志,例如在 Linux 上:

sort <file> | uniq --count

Assuming there is one number per line:

sort <file> | uniq -c

You can use the more verbose --count flag too with the GNU version, e.g., on Linux:

sort <file> | uniq --count
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文