Perl - 查找文件或数组中的重复行
我正在尝试从文件句柄中打印重复的行,而不是删除它们或我在其他问题上看到的任何其他内容。我没有足够的 perl 经验,无法快速做到这一点,所以我在这里问。有什么方法可以做到这一点?
I'm trying to print duplicate lines from the filehandle, not remove them or anything else I see asked on other questions. I don't have enough experience with perl to be able to quickly do this, so I'm asking here. What's the way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用标准 Perl 速记法:
作为“一句话”:
更多数据?这会打印
::
:Explanation of
%seen
:%seen
声明一个哈希。对于输入中的每个唯一行(在本例中来自while(<>)
)$seen{$_}
在散列中将有一个由该行文本命名的标量槽(这就是$_
在 has{}
大括号中所做的事情)。x++
),我们获取表达式的值,并记住在表达式之后递增它。因此,如果我们没有“看到”行$seen{$_}
是未定义的 - 但当强制进入这样的数字“上下文”时,它会被视为 0 - 并且 假。因此,当
while
开始运行时,所有行都是“零”(如果有帮助,您可以将这些行视为“not%seen
") 然后,当我们第一次看到一行时,perl
会采用未定义的值 - 这会使if
失败 - 并将标量槽处的计数增加到 1。因此,对于任何未来发生的情况,它都为 1,此时它通过if
条件并打印。现在,正如我上面所说,
%seen
声明了一个哈希,但在关闭strict
的情况下,可以当场创建任何变量表达式。因此,当 Perl 第一次看到$seen{$_}
时,它知道我正在寻找%seen
,但它没有它,因此它创建了它。一个额外的好处是,最后,如果您愿意使用它,您可以计算每行重复的次数。
Using the standard Perl shorthands:
As a "one-liner":
More data? This prints
<file name>:<line number>:<line>
:Explanation of
%seen
:%seen
declares a hash. For each unique line in the input (which is coming fromwhile(<>)
in this case)$seen{$_}
will have a scalar slot in the hash named by the the text of the line (this is what$_
is doing in the has{}
braces).x++
) we take the value for our expression, remembering to increment it after the expression. So, if we haven't "seen" the line$seen{$_}
is undefined--but when forced into an numeric "context" like this, it's taken as 0--and false.So, when the
while
begins to run, all lines are "zero" (if it helps you can think of the lines as "not%seen
") then, the first time we see a line,perl
takes the undefined value - which fails theif
- and increments the count at the scalar slot to 1. Thus, it is 1 for any future occurrences at which point it passes theif
condition and it printed.Now as I said above,
%seen
declares a hash, but withstrict
turned off, any variable expression can be created on the spot. So the first time perl sees$seen{$_}
it knows that I'm looking for%seen
, it doesn't have it, so it creates it.An added neat thing about this is that at the end, if you care to use it, you have a count of how many times each line was repeated.
试试这个
try this
仅打印一次重复内容:
Prints dupes only once:
如果你有一个类 Unix 系统,你可以使用
uniq
:或者
应该做你想做的。更多信息:man uniq。
If you have a Unix-like system, you can use
uniq
:or
should do what you want. More information: man uniq.