计算 Unix 上每行/字段的字符出现次数

发布于 2024-12-22 19:49:27 字数 547 浏览 1 评论 0原文

给定一个包含这样数据的文件(即stores.dat 文件),

sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200

返回每行“t”字符出现次数的命令是什么?

例如。将返回:

count   lineNum
   4       1
   3       2
   6       3

另外,要按字段的出现次数进行计数,返回以下结果的命令是什么?

例如。输入第 2 列和字符“t”,

count   lineNum
   1       1
   0       2
   1       3

例如。输入第 3 列和字符“t”

count   lineNum
   2       1
   1       2
   4       3

Given a file with data like this (ie stores.dat file)

sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200

What is the command that would return the number of occurrences of the 't' character per line?

eg. would return:

count   lineNum
   4       1
   3       2
   6       3

Also, to do it by count of occurrences by field what is the command to return the following results?

eg. input of column 2 and character 't'

count   lineNum
   1       1
   0       2
   1       3

eg. input of column 3 and character 't'

count   lineNum
   2       1
   1       2
   4       3

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

许仙没带伞 2024-12-29 19:49:27

要计算每行字符的出现次数,您可以执行以下操作:

awk -F'|' 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"") "\t" NR}' file
count lineNum
4       1
3       2
6       3

要计算每个字段/列的字符出现次数,您可以执行以下操作:

第 2 列:

awk -F'|' -v fld=2 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
1       1
0       2
1       3

第 3 列:

awk -F'|' -v fld=3 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
2       1
1       2
4       3
  • gsub () 函数的返回值是进行替换的次数。所以我们用它来打印数字。
  • NR 保存行号,因此我们用它来打印行号。
  • 为了打印特定字段的出现次数,我们创建一个变量 fld 并放入我们希望从中提取计数的字段编号。

To count occurrence of a character per line you can do:

awk -F'|' 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"") "\t" NR}' file
count lineNum
4       1
3       2
6       3

To count occurrence of a character per field/column you can do:

column 2:

awk -F'|' -v fld=2 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
1       1
0       2
1       3

column 3:

awk -F'|' -v fld=3 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
2       1
1       2
4       3
  • gsub() function's return value is number of substitution made. So we use that to print the number.
  • NR holds the line number so we use it to print the line number.
  • For printing occurrences of particular field, we create a variable fld and put the field number we wish to extract counts from.
非要怀念 2024-12-29 19:49:27
grep -n -o "t" stores.dat | sort -n | uniq -c | cut -d : -f 1

给出了几乎完全符合您想要的输出:

  4 1
  3 2
  6 3

感谢 @raghav-bhushan 的 grep -o 提示,这是一个多么有用的标志啊。 -n 标志还包括行号。

grep -n -o "t" stores.dat | sort -n | uniq -c | cut -d : -f 1

gives almost exactly the output you want:

  4 1
  3 2
  6 3

Thanks to @raghav-bhushan for the grep -o hint, what a useful flag. The -n flag includes the line number as well.

执着的年纪 2024-12-29 19:49:27

要计算每行字符的出现次数:

$ awk -F 't' '{print NF-1, NR}'  input.txt
4 1
3 2
6 3

这将字段分隔符设置为需要计算的字符,然后使用字段数比分隔符数大 1 的事实。

要计算特定列中的出现次数,首先剪切该列:

$ cut -d '|' -f 2 input.txt | awk -F 't' '{print NF-1, NR}'
1 1
0 2
1 3

$ cut -d '|' -f 3 input.txt | awk -F 't' '{print NF-1, NR}'
2 1
1 2
4 3

To count occurences of a character per line:

$ awk -F 't' '{print NF-1, NR}'  input.txt
4 1
3 2
6 3

this sets field separator to the character that needs to be counted, then uses the fact that number of fields is one greater than number of separators.

To count occurences in a particular column cut out that column first:

$ cut -d '|' -f 2 input.txt | awk -F 't' '{print NF-1, NR}'
1 1
0 2
1 3

$ cut -d '|' -f 3 input.txt | awk -F 't' '{print NF-1, NR}'
2 1
1 2
4 3
盛装女皇 2024-12-29 19:49:27

使用 perl 的一种可能的解决方案:

script.pl 的内容:

use warnings;
use strict;

## Check arguments:
## 1.- Input file
## 2.- Char to search.
## 3.- (Optional) field to search. If blank, zero or bigger than number
##     of columns, default to search char in all the line.
(@ARGV == 2 || @ARGV == 3) or die qq(Usage: perl $0 input-file char [column]\n);

my ($char,$column);

## Get values or arguments.
if ( @ARGV == 3 ) {
        ($char, $column) = splice @ARGV, -2;
} else {
        $char = pop @ARGV;
        $column = 0;
}

## Check that $char must be a non-white space character and $column 
## only accept numbers.
die qq[Bad input\n] if $char !~ m/^\S$/ or $column !~ m/^\d+$/; 

print qq[count\tlineNum\n];

while ( <> ) {
        ## Remove last '\n'
        chomp;

        ## Get fields.
        my @f = split /\|/;

        ## If column is a valid one, select it to the search.
        if ( $column > 0 and $column <= scalar @f ) {
                $_ = $f[ $column - 1];
        }

        ## Count.
        my $count = eval qq[tr/$char/$char/];

        ## Print result.
        printf qq[%d\t%d\n], $count, $.;
}

该脚本接受三个参数:

  1. 输入文件
  2. 要搜索的
  3. 字符 要搜索的列: 如果列是错误数字,则搜索所有的线。

不带参数运行脚本:

perl script.pl
Usage: perl script.pl input-file char [column]

带参数及其输出:

这里 0 是一个错误的列,它会搜索所有行。

perl script.pl stores.dat 't' 0
count   lineNum
4       1
3       2
6       3

这里它在第 1 列中搜索。

perl script.pl stores.dat 't' 1
count   lineNum
0       1
2       2
0       3

这里它在第 3 列中搜索。

perl script.pl stores.dat 't' 3
count   lineNum
2       1
1       2
4       3

th 不是字符。

perl script.pl stores.dat 'th' 3
Bad input

One possible solution using perl:

Content of script.pl:

use warnings;
use strict;

## Check arguments:
## 1.- Input file
## 2.- Char to search.
## 3.- (Optional) field to search. If blank, zero or bigger than number
##     of columns, default to search char in all the line.
(@ARGV == 2 || @ARGV == 3) or die qq(Usage: perl $0 input-file char [column]\n);

my ($char,$column);

## Get values or arguments.
if ( @ARGV == 3 ) {
        ($char, $column) = splice @ARGV, -2;
} else {
        $char = pop @ARGV;
        $column = 0;
}

## Check that $char must be a non-white space character and $column 
## only accept numbers.
die qq[Bad input\n] if $char !~ m/^\S$/ or $column !~ m/^\d+$/; 

print qq[count\tlineNum\n];

while ( <> ) {
        ## Remove last '\n'
        chomp;

        ## Get fields.
        my @f = split /\|/;

        ## If column is a valid one, select it to the search.
        if ( $column > 0 and $column <= scalar @f ) {
                $_ = $f[ $column - 1];
        }

        ## Count.
        my $count = eval qq[tr/$char/$char/];

        ## Print result.
        printf qq[%d\t%d\n], $count, $.;
}

The script accepts three parameters:

  1. Input file
  2. Char to search
  3. Column to search: If column is a bad digit, it searchs all the line.

Running the script without arguments:

perl script.pl
Usage: perl script.pl input-file char [column]

With arguments and its output:

Here 0 is a bad column, it searches all the line.

perl script.pl stores.dat 't' 0
count   lineNum
4       1
3       2
6       3

Here it searches in column 1.

perl script.pl stores.dat 't' 1
count   lineNum
0       1
2       2
0       3

Here it searches in column 3.

perl script.pl stores.dat 't' 3
count   lineNum
2       1
1       2
4       3

th is not a char.

perl script.pl stores.dat 'th' 3
Bad input
阪姬 2024-12-29 19:49:27

不需要 awk 或 perl,只需使用 bash 和标准 Unix 实用程序:

cat file | tr -c -d "t\n" | cat -n |
  { echo "count   lineNum"
    while read num data; do
      test ${#data} -gt 0 && printf "%4d   %5d\n" ${#data} $num
    done; }

对于特定列:

cut -d "|" -f 2 file | tr -c -d "t\n" | cat -n |
  { echo -e "count lineNum"
    while read num data; do
      test ${#data} -gt 0 && printf "%4d   %5d\n" ${#data} $num
    done; }

我们甚至可以避免 trcats:

echo "count   lineNum"
num=1
while read data; do
  new_data=${data//t/}
  count=$((${#data}-${#new_data}))
  test $count -gt 0 && printf "%4d   %5d\n" $count $num
  num=$(($num+1))
done < file

并事件剪切:

echo "count   lineNum"
num=1; OLF_IFS=$IFS; IFS="|"
while read -a array_data; do
  data=${array_data[1]}
  new_data=${data//t/}
  count=$((${#data}-${#new_data}))
  test $count -gt 0 && printf "%4d   %5d\n" $count $num
  num=$(($num+1))
done < file
IFS=$OLF_IFS

No need for awk or perl, only with bash and standard Unix utilities:

cat file | tr -c -d "t\n" | cat -n |
  { echo "count   lineNum"
    while read num data; do
      test ${#data} -gt 0 && printf "%4d   %5d\n" ${#data} $num
    done; }

And for a particular column:

cut -d "|" -f 2 file | tr -c -d "t\n" | cat -n |
  { echo -e "count lineNum"
    while read num data; do
      test ${#data} -gt 0 && printf "%4d   %5d\n" ${#data} $num
    done; }

And we can even avoid tr and the cats:

echo "count   lineNum"
num=1
while read data; do
  new_data=${data//t/}
  count=$((${#data}-${#new_data}))
  test $count -gt 0 && printf "%4d   %5d\n" $count $num
  num=$(($num+1))
done < file

and event the cut:

echo "count   lineNum"
num=1; OLF_IFS=$IFS; IFS="|"
while read -a array_data; do
  data=${array_data[1]}
  new_data=${data//t/}
  count=$((${#data}-${#new_data}))
  test $count -gt 0 && printf "%4d   %5d\n" $count $num
  num=$(($num+1))
done < file
IFS=$OLF_IFS
南街女流氓 2024-12-29 19:49:27

您还可以使用“t”分割行或字段,并检查结果数组的长度 - 1. 将行的 col 变量设置为 0,将列设置为 1 到 3:

awk -F'|' -v col=0 -v OFS=
\t' 'BEGIN {
    print "count", "lineNum"
}{
    split($col, a, "t"); print length(a) - 1, NR
}
' stores.dat

You could also split the line or field with "t" and check the length of the resulting array - 1. Set the col variable to 0 for the line or 1 through 3 for columns:

awk -F'|' -v col=0 -v OFS=
\t' 'BEGIN {
    print "count", "lineNum"
}{
    split($col, a, "t"); print length(a) - 1, NR
}
' stores.dat
你的背包 2024-12-29 19:49:27
awk '{gsub("[^t]",""); print length($0),NR;}' stores.dat

对 gsub() 的调用会删除该行中不在的所有内容,然后仅打印剩余内容的长度以及当前行号。

只想为第 2 列执行此操作吗?

awk 'BEGIN{FS="|"} {gsub("[^t]","",$2); print NR,length($2);}' stores.dat
awk '{gsub("[^t]",""); print length($0),NR;}' stores.dat

The call to gsub() deletes everything in the line that is not a t, then just print the length of what remains, and the current line number.

Want to do it just for column 2?

awk 'BEGIN{FS="|"} {gsub("[^t]","",$2); print NR,length($2);}' stores.dat
黯淡〆 2024-12-29 19:49:27
 $ cat -n test.txt
 1  test 1
 2  you want
 3  void
 4  you don't want
 5  ttttttttttt
 6  t t t t t t

 $ awk '{n=split($0,c,"t")-1;if (n!=0) print n,NR}' test.txt
 2 1
 1 2
 2 4
 11 5
 6 6
 $ cat -n test.txt
 1  test 1
 2  you want
 3  void
 4  you don't want
 5  ttttttttttt
 6  t t t t t t

 $ awk '{n=split($0,c,"t")-1;if (n!=0) print n,NR}' test.txt
 2 1
 1 2
 2 4
 11 5
 6 6
一花一树开 2024-12-29 19:49:27
cat stores.dat | awk 'BEGIN {FS = "|"}; {print $1}' |  awk 'BEGIN {FS = "\t"}; {print NF}'

其中 $1 是您要计数的列号。

cat stores.dat | awk 'BEGIN {FS = "|"}; {print $1}' |  awk 'BEGIN {FS = "\t"}; {print NF}'

Where $1 would be a column number you want to count.

温柔嚣张 2024-12-29 19:49:27
perl -e 'while(<>) { $count = tr/t//; print "$count ".++$x."\n"; }' stores.dat

另一个 perl 答案耶! tr/t// 函数返回该行发生翻译的次数,换句话说,tr 找到字符“t”的次数。 ++$x 维护行号计数。

perl -e 'while(<>) { $count = tr/t//; print "$count ".++$x."\n"; }' stores.dat

Another perl answer yay! The tr/t// function returns the count of the number of times the translation occurred on that line, in other words the number of times tr found the character 't'. ++$x maintains the line number count.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文