计算 Unix 上每行/字段的字符出现次数
给定一个包含这样数据的文件(即stores.dat 文件),
sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200
返回每行“t”字符出现次数的命令是什么?
例如。将返回:
count lineNum
4 1
3 2
6 3
另外,要按字段的出现次数进行计数,返回以下结果的命令是什么?
例如。输入第 2 列和字符“t”,
count lineNum
1 1
0 2
1 3
例如。输入第 3 列和字符“t”
count lineNum
2 1
1 2
4 3
Given a file with data like this (ie stores.dat file)
sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200
What is the command that would return the number of occurrences of the 't' character per line?
eg. would return:
count lineNum
4 1
3 2
6 3
Also, to do it by count of occurrences by field what is the command to return the following results?
eg. input of column 2 and character 't'
count lineNum
1 1
0 2
1 3
eg. input of column 3 and character 't'
count lineNum
2 1
1 2
4 3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
要计算每行字符的出现次数,您可以执行以下操作:
要计算每个字段/列的字符出现次数,您可以执行以下操作:
第 2 列:
第 3 列:
gsub ()
函数的返回值是进行替换的次数。所以我们用它来打印数字。NR
保存行号,因此我们用它来打印行号。To count occurrence of a character per line you can do:
To count occurrence of a character per field/column you can do:
column 2:
column 3:
gsub()
function's return value is number of substitution made. So we use that to print the number.NR
holds the line number so we use it to print the line number.fld
and put the field number we wish to extract counts from.给出了几乎完全符合您想要的输出:
感谢 @raghav-bhushan 的 grep -o 提示,这是一个多么有用的标志啊。 -n 标志还包括行号。
gives almost exactly the output you want:
Thanks to @raghav-bhushan for the
grep -o
hint, what a useful flag. The -n flag includes the line number as well.要计算每行字符的出现次数:
这将字段分隔符设置为需要计算的字符,然后使用字段数比分隔符数大 1 的事实。
要计算特定列中的出现次数,首先
剪切
该列:To count occurences of a character per line:
this sets field separator to the character that needs to be counted, then uses the fact that number of fields is one greater than number of separators.
To count occurences in a particular column
cut
out that column first:使用
perl
的一种可能的解决方案:script.pl 的内容:
该脚本接受三个参数:
不带参数运行脚本:
带参数及其输出:
这里 0 是一个错误的列,它会搜索所有行。
这里它在第 1 列中搜索。
这里它在第 3 列中搜索。
th
不是字符。One possible solution using
perl
:Content of script.pl:
The script accepts three parameters:
Running the script without arguments:
With arguments and its output:
Here 0 is a bad column, it searches all the line.
Here it searches in column 1.
Here it searches in column 3.
th
is not a char.不需要 awk 或 perl,只需使用 bash 和标准 Unix 实用程序:
对于特定列:
我们甚至可以避免
tr
和cat
s:并事件剪切:
No need for awk or perl, only with bash and standard Unix utilities:
And for a particular column:
And we can even avoid
tr
and thecat
s:and event the cut:
您还可以使用“t”分割行或字段,并检查结果数组的长度 - 1. 将行的
col
变量设置为 0,将列设置为 1 到 3:You could also split the line or field with "t" and check the length of the resulting array - 1. Set the
col
variable to 0 for the line or 1 through 3 for columns:对 gsub() 的调用会删除该行中不在的所有内容,然后仅打印剩余内容的长度以及当前行号。
只想为第 2 列执行此操作吗?
The call to gsub() deletes everything in the line that is not a t, then just print the length of what remains, and the current line number.
Want to do it just for column 2?
其中
$1
是您要计数的列号。Where
$1
would be a column number you want to count.另一个 perl 答案耶! tr/t// 函数返回该行发生翻译的次数,换句话说,tr 找到字符“t”的次数。 ++$x 维护行号计数。
Another perl answer yay! The tr/t// function returns the count of the number of times the translation occurred on that line, in other words the number of times tr found the character 't'. ++$x maintains the line number count.