grep:计算所有模式的所有匹配项
我有数以万计的(固定)模式,我想在一个非常大的文件中找到匹配的模式。我想计算每个模式的点击总数。我在 grep 文档中找不到任何表明这是可能的内容。我的设置看起来像这样:
gunzip -c bigfile.txt.gz | grep -c -fpatterns.txt
当然,这会计算与 patterns.txt
中匹配任何的行,而我想要的是单独的命中计数每个图案。在命令行上使用 grep 可以实现类似的操作吗?或者我必须写一个程序?
I have tens of thousands of (fixed) patterns that I want to find matches for in a very large file. I would like to count the total number of hits for each pattern. I can't find anything in the grep documentation that suggests this is possible. My setup would look something like this:
gunzip -c bigfile.txt.gz | grep -c -f patterns.txt
Of course this counts lines that matched anything in patterns.txt
, when what I want are the individual counts of hits for each pattern. Is something like this possible on the command line with grep? Or will I have to write a program?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不知道如何同时对所有模式执行此操作,但您可以编写一个 bash 脚本,一次读取一个模式并执行 grep | 。 wc -l 为每一个。
I don't know about doing it for all patterns at once, but you could write a bash script that reads them one at a time and do grep | wc -l for each one.
像这样的事情怎么样:
排序可能有点大,因为它将保存整个输出。不过,带有哈希值的快速 perl/python/... 脚本可以大大减少这种情况。
这是避免排序的脚本,看看它是否真的加快了速度。
How about something like so:
The sort may be kind of large as it'll save the entire output. A quick perl/python/... script with a hash could cut that down substantially though.
Here's the script that avoids the sort, see if it actually speeds things up.