当前位置：文江博客话题详情

Perl - 查找文件或数组中的重复行

发布于 2024-11-04 17:46:21 字数 96 浏览 5 评论 0原文

我正在尝试从文件句柄中打印重复的行，而不是删除它们或我在其他问题上看到的任何其他内容。我没有足够的 perl 经验，无法快速做到这一点，所以我在这里问。有什么方法可以做到这一点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

落花随流水 2024-11-11 17:46:21

使用标准 Perl 速记法：

my %seen;
while ( <> ) { 
    print if $seen{$_}++;
}

作为“一句话”：

perl -ne 'print if $seen{$_}++'

更多数据？这会打印 :::

perl -ne 'print ( $ARGV eq "-" ? "" : "$ARGV:" ), "$.:$_" if $seen{$_}++'

Explanation of %seen:

%seen 声明一个哈希。对于输入中的每个唯一行（在本例中来自 while(<>)）$seen{$_}在散列中将有一个由该行文本命名的标量槽（这就是 $_ 在 has {} 大括号中所做的事情）。
使用后缀增量运算符 (x++)，我们获取表达式的值，并记住在表达式之后递增它。因此，如果我们没有“看到”行 $seen{$_} 是未定义的 - 但当强制进入这样的数字“上下文”时，它会被视为 0 - 并且假。
然后它增加到 1。

因此，当 while 开始运行时，所有行都是“零”（如果有帮助，您可以将这些行视为“not %seen ") 然后，当我们第一次看到一行时，perl 会采用未定义的值 - 这会使 if 失败 - 并将标量槽处的计数增加到 1。因此，对于任何未来发生的情况，它都为 1，此时它通过 if 条件并打印。

现在，正如我上面所说，%seen 声明了一个哈希，但在关闭 strict 的情况下，可以当场创建任何变量表达式。因此，当 Perl 第一次看到 $seen{$_} 时，它知道我正在寻找 %seen，但它没有它，因此它创建了它。

一个额外的好处是，最后，如果您愿意使用它，您可以计算每行重复的次数。

Using the standard Perl shorthands:

my %seen;
while ( <> ) { 
    print if $seen{$_}++;
}

As a "one-liner":

perl -ne 'print if $seen{$_}++'

More data? This prints <file name>:<line number>:<line>:

perl -ne 'print ( $ARGV eq "-" ? "" : "$ARGV:" ), "$.:$_" if $seen{$_}++'

Explanation of %seen:

%seen declares a hash. For each unique line in the input (which is coming from while(<>) in this case) $seen{$_} will have a scalar slot in the hash named by the the text of the line (this is what $_ is doing in the has {} braces).
Using the postfix increment operator (x++) we take the value for our expression, remembering to increment it after the expression. So, if we haven't "seen" the line $seen{$_} is undefined--but when forced into an numeric "context" like this, it's taken as 0--and false.
Then it's incremented to 1.

So, when the while begins to run, all lines are "zero" (if it helps you can think of the lines as "not %seen") then, the first time we see a line, perl takes the undefined value - which fails the if - and increments the count at the scalar slot to 1. Thus, it is 1 for any future occurrences at which point it passes the if condition and it printed.

Now as I said above, %seen declares a hash, but with strict turned off, any variable expression can be created on the spot. So the first time perl sees $seen{$_} it knows that I'm looking for %seen, it doesn't have it, so it creates it.

An added neat thing about this is that at the end, if you care to use it, you have a count of how many times each line was repeated.

回复收藏 0 原文

夏末 2024-11-11 17:46:21

试试这个

#!/usr/bin/perl -w
use strict;
use warnings;

my %duplicates;
while (<DATA>) {
    print if !defined $duplicates{$_};
    $duplicates{$_}++;
}

try this

#!/usr/bin/perl -w
use strict;
use warnings;

my %duplicates;
while (<DATA>) {
    print if !defined $duplicates{$_};
    $duplicates{$_}++;
}

回复收藏 0 原文

倾城月光淡如水﹏ 2024-11-11 17:46:21

仅打印一次重复内容：

perl -ne "print if $seen{$_}++ == 1"

Prints dupes only once:

perl -ne "print if $seen{$_}++ == 1"

回复收藏 0 原文

赠佳期 2024-11-11 17:46:21

如果你有一个类 Unix 系统，你可以使用 uniq:

uniq -d foo

或者

uniq -D foo

应该做你想做的。更多信息：man uniq。

If you have a Unix-like system, you can use uniq:

uniq -d foo

uniq -D foo

should do what you want. More information: man uniq.

回复收藏 0 原文

~没有更多了~

关于作者

一直在等你来

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

Perl - 查找文件或数组中的重复行

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lanyue

海螺姑娘

Demos

亢龙有悔

海未深

浅忆流年

友情链接

Perl - 查找文件或数组中的重复行

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lanyue

海螺姑娘

Demos

亢龙有悔

海未深

浅忆流年

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。