grep:计算所有模式的所有匹配项

发布于 2024-12-11 05:29:27 字数 265 浏览 0 评论 0原文

我有数以万计的(固定)模式,我想在一个非常大的文件中找到匹配的模式。我想计算每个模式的点击总数。我在 grep 文档中找不到任何表明这是可能的内容。我的设置看起来像这样:

gunzip -c bigfile.txt.gz | grep -c -fpatterns.txt

当然,这会计算与 patterns.txt 中匹配任何的行,而我想要的是单独的命中计数每个图案。在命令行上使用 grep 可以实现类似的操作吗?或者我必须写一个程序?

I have tens of thousands of (fixed) patterns that I want to find matches for in a very large file. I would like to count the total number of hits for each pattern. I can't find anything in the grep documentation that suggests this is possible. My setup would look something like this:

gunzip -c bigfile.txt.gz | grep -c -f patterns.txt

Of course this counts lines that matched anything in patterns.txt, when what I want are the individual counts of hits for each pattern. Is something like this possible on the command line with grep? Or will I have to write a program?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

执手闯天涯 2024-12-18 05:29:27

我不知道如何同时对所有模式执行此操作,但您可以编写一个 bash 脚本,一次读取一个模式并执行 grep | 。 wc -l 为每一个。

I don't know about doing it for all patterns at once, but you could write a bash script that reads them one at a time and do grep | wc -l for each one.

一身软味 2024-12-18 05:29:27

像这样的事情怎么样:

gunzip -c bigfile.txt.gz | grep -f patterns.txt | sort | uniq -c

排序可能有点大,因为它将保存整个输出。不过,带有哈希值的快速 perl/python/... 脚本可以大大减少这种情况。

$ grep -f pats.txt a.txt  | ./t.rb 
a 3
b 3
c 2

这是避免排序的脚本,看看它是否真的加快了速度。

#!/usr/bin/env ruby
results = {}
while gets
  line = $_.chomp
  results[line] ||= 0
  results[line]+= 1
end
results.each { |k,v| puts ""#{k} #{v}"}

How about something like so:

gunzip -c bigfile.txt.gz | grep -f patterns.txt | sort | uniq -c

The sort may be kind of large as it'll save the entire output. A quick perl/python/... script with a hash could cut that down substantially though.

$ grep -f pats.txt a.txt  | ./t.rb 
a 3
b 3
c 2

Here's the script that avoids the sort, see if it actually speeds things up.

#!/usr/bin/env ruby
results = {}
while gets
  line = $_.chomp
  results[line] ||= 0
  results[line]+= 1
end
results.each { |k,v| puts ""#{k} #{v}"}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文