计算文件中某个模式的出现次数（即使在同一行）

发布于 2024-09-03 03:20:36 字数 219 浏览 7 评论 0原文

当搜索文件中字符串的出现次数时，我通常使用：

grep pattern file | wc -l

但是，由于 grep 的工作方式，每行只能找到一次出现的情况。如何搜索字符串在文件中出现的次数，无论它们是在同一行还是不同行？

另外，如果我正在搜索正则表达式模式而不是简单的字符串怎么办？我如何计算这些，或者更好的是，在新行上打印每个匹配项？

原文

When searching for number of occurrences of a string in a file, I generally use:

grep pattern file | wc -l

However, this only finds one occurrence per line, because of the way grep works. How can I search for the number of times a string appears in a file, regardless of whether they are on the same or different lines?

Also, what if I'm searching for a regex pattern, not a simple string? How can I count those, or, even better, print each match on a new line?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

标点 2024-09-10 03:20:36

要计算所有出现次数，请使用 -o。试试这个：

echo afoobarfoobar | grep -o foo | wc -l

当然还有 man grep （：

更新

一些建议仅使用 grep -co foo 而不是 grep -o foo | wc -l

不。

此快捷方式并非在所有情况下都有效。手册页显示：

-c print a count of matching lines

这些方法的差异如下所示：

$ echo afoobarfoobar | grep -oc foo
1

一旦在行中找到匹配项（a{foo}barfoobar) 只检查了一行并匹配，因此输出为 1 实际上，这里忽略了 -o。您可以使用 grep -c 代替。

$ echo afoobarfoobar | grep -o foo
foo
foo

$ echo afoobarfoobar | grep -o foo | wc -l
2

在 (a{foo}bar{foo}bar) 行中找到两个匹配项，因为我们明确要求查找。每次出现 (-o)。每次出现都打印在单独的行上，wc -l 仅计算输出中的行数。。

To count all occurrences, use -o. Try this:

echo afoobarfoobar | grep -o foo | wc -l

And man grep of course (:

Update

Some suggest to use just grep -co foo instead of grep -o foo | wc -l.

Don't.

This shortcut won't work in all cases. Man page says:

-c print a count of matching lines

Difference in these approaches is illustrated below:

$ echo afoobarfoobar | grep -oc foo
1

As soon as the match is found in the line (a{foo}barfoobar) the searching stops. Only one line was checked and it matched, so the output is 1. Actually -o is ignored here and you could just use grep -c instead.

$ echo afoobarfoobar | grep -o foo
foo
foo

$ echo afoobarfoobar | grep -o foo | wc -l
2

Two matches are found in the line (a{foo}bar{foo}bar) because we explicitly asked to find every occurrence (-o). Every occurence is printed on a separate line, and wc -l just counts the number of lines in the output.

回复收藏 0 原文

混吃等死 2024-09-10 03:20:36

试试这个：

grep "string to search for" FileNameToSearch | cut -d ":" -f 4 | sort -n | uniq -c

示例：

grep "SMTP connect from unknown" maillog | cut -d ":" -f 4 | sort -n | uniq -c
  6  SMTP connect from unknown [188.190.118.90]
 54  SMTP connect from unknown [62.193.131.114]
  3  SMTP connect from unknown [91.222.51.253]

Try this:

grep "string to search for" FileNameToSearch | cut -d ":" -f 4 | sort -n | uniq -c

Sample:

grep "SMTP connect from unknown" maillog | cut -d ":" -f 4 | sort -n | uniq -c
  6  SMTP connect from unknown [188.190.118.90]
 54  SMTP connect from unknown [62.193.131.114]
  3  SMTP connect from unknown [91.222.51.253]

回复收藏 0 原文

も让我眼熟你 2024-09-10 03:20:36

Ripgrep 是 grep 的快速替代品，刚刚引入了--count-matches 标志允许在 0.9 版本中对 each 匹配进行计数（我使用上面的示例来保持一致）：

> echo afoobarfoobar | rg --count foo
1
> echo afoobarfoobar | rg --count-matches foo
2

根据 OP 的要求，ripgrep 允许使用正则表达式模式以及（--regexp）。
它还可以在单独的行上打印每个（行）匹配：

> echo -e "line1foo\nline2afoobarfoobar" | rg foo
line1foo
line2afoobarfoobar

Ripgrep, which is a fast alternative to grep, has just introduced the --count-matches flag allowing counting each match in version 0.9 (I'm using the above example to stay consistent):

> echo afoobarfoobar | rg --count foo
1
> echo afoobarfoobar | rg --count-matches foo
2

As asked by OP, ripgrep allows for regex pattern as well (--regexp <PATTERN>).
Also it can print each (line) match on a separate line:

> echo -e "line1foo\nline2afoobarfoobar" | rg foo
line1foo
line2afoobarfoobar

回复收藏 0 原文

猫性小仙女 2024-09-10 03:20:36

迟来的帖子：
在 awk 中使用搜索正则表达式模式作为记录分隔符 (RS)
这允许您的正则表达式跨越 \n 分隔的行（如果您需要的话）。

printf 'X \n moo X\n XX\n' | 
   awk -vRS='X[^X]*X' 'END{print (NR<2?0:NR-1)}'

A belated post:
Use the search regex pattern as a Record Separator (RS) in awk
This allows your regex to span \n-delimited lines (if you need it).

printf 'X \n moo X\n XX\n' | 
   awk -vRS='X[^X]*X' 'END{print (NR<2?0:NR-1)}'

回复收藏 0 原文

一场春暖 2024-09-10 03:20:36

破解 grep 的颜色函数，并计算它打印出多少个颜色标签：

echo -e "a\nb  b b\nc\ndef\nb e brb\nr" \
| GREP_COLOR="033" grep --color=always  b \
| perl -e 'undef $/; $_=<>; s/\n//g; s/\x1b\x5b\x30\x33\x33/\n/g; print $_' \
| wc -l

Hack grep's color function, and count how many color tags it prints out:

echo -e "a\nb  b b\nc\ndef\nb e brb\nr" \
| GREP_COLOR="033" grep --color=always  b \
| perl -e 'undef $/; $_=<>; s/\n//g; s/\x1b\x5b\x30\x33\x33/\n/g; print $_' \
| wc -l

回复收藏 0 原文

~没有更多了~