尴尬的数值分组列记录总数

发布于 2025-01-25 08:06:24 字数 454 浏览 2 评论 0原文

我有一个变量,该变量根据条件(通过其他编程语言组成)将列的结果分配。

我试图拥有一个计算每个组NR的变量。如果总结所有组,则应具有文件的NR。

例如nr [拆分变量]时,我会遇到一个致命的错误。

当我尝试在计算中使用nr时, 每个组中的

sex, weight

male,50
female,49
female,48
male,66
male,78
female,98
male,74
male,54
female,65

NR是9个,但实际上,我想要一种男性的NR,是女性的5和

4

sex= $(f["sex"])   
ccWeight[sex] += $(f["weight"])
avgWeight = ccWeight[sex] / ¿?

。 :我现在不需要打印结果,而只是将此数字存储在变量上。

I have a variable which splits the results of a column based on a condition (group by in others programming languages).

I'm trying to have a variable that counts the NR of each group. If we sum all the groups we should have the NR of the file.

When I try to use NR in the calculation for example NR[variable that splits], I get a fatal error "you tried to use scalar as matrix.

Any ideas how to use NR as a variable, but not counting all the records, only those from each group?

sex, weight

male,50
female,49
female,48
male,66
male,78
female,98
male,74
male,54
female,65

In this case the NR would be 9 BUT, in reality I want a way to get that NR of male is 5 and 4 for female.

I have the total sum of weigth column but struggle to get the avg:

sex= $(f["sex"])   
ccWeight[sex] += $(f["weight"])
avgWeight = ccWeight[sex] / ¿?

Important: I don't need to print the result as of now, just to store this number on a variable.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

装迷糊 2025-02-01 08:06:24

一个awk想法:

awk -F, '
NR>1 { counts[$1]++              # keep count of each distinct sex
       counts_total++            # replace dependency on NR
       weight[$1]+=$2            # keep sum of weights by sex
     }
END  { for (i in counts) {
           printf "%s: (count) %s of %s (%.2f%)\n",i,counts[i],counts_total,(counts[i]/counts_total*100)
           printf "%s: (avg weight) %.2f ( %s / %s )\n",i,(weight[i]/counts[i]),weight[i],counts[i]
       }
     }
' sample.dat

注意:

  • op可以添加其他代码以验证总数和权重并不为零(以免生成“按零”错误生成“分隔”
  • )如果没有(Fe)男性记录要处理?

这会生成:

female: (count) 4 of 9 (44.44%)
female: (avg weight) 65.00 ( 260 / 4 )
male: (count) 5 of 9 (55.56%)
male: (avg weight) 64.40 ( 322 / 5 )

One awk idea:

awk -F, '
NR>1 { counts[$1]++              # keep count of each distinct sex
       counts_total++            # replace dependency on NR
       weight[$1]+=$2            # keep sum of weights by sex
     }
END  { for (i in counts) {
           printf "%s: (count) %s of %s (%.2f%)\n",i,counts[i],counts_total,(counts[i]/counts_total*100)
           printf "%s: (avg weight) %.2f ( %s / %s )\n",i,(weight[i]/counts[i]),weight[i],counts[i]
       }
     }
' sample.dat

NOTE:

  • OP can add additional code to verify total counts and weights are not zero (so as to keep from generating a 'divide by zero' error)
  • perhaps print a different message if there are no (fe)male records to process?

This generates:

female: (count) 4 of 9 (44.44%)
female: (avg weight) 65.00 ( 260 / 4 )
male: (count) 5 of 9 (55.56%)
male: (avg weight) 64.40 ( 322 / 5 )
安人多梦 2025-02-01 08:06:24

gnu datamash可能是您要寻找的东西,例如:

<infile datamash -Hst, groupby 1 count 1 sum 2 mean 2 | column -s, -t

输出:

GroupBy(sex)  count(sex)  sum(weight)  mean(weight)
female        4           260          65
male          5           322          64.4

GNU datamash might be what you are looking for, e.g.:

<infile datamash -Hst, groupby 1 count 1 sum 2 mean 2 | column -s, -t

Output:

GroupBy(sex)  count(sex)  sum(weight)  mean(weight)
female        4           260          65
male          5           322          64.4
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文