尴尬的数值分组列记录总数

发布于 2025-01-25 08:06:24 字数 454 浏览 2 评论 0原文

我有一个变量，该变量根据条件（通过其他编程语言组成）将列的结果分配。

我试图拥有一个计算每个组NR的变量。如果总结所有组，则应具有文件的NR。

例如nr [拆分变量]时，我会遇到一个致命的错误。

当我尝试在计算中使用nr时，每个组中的

sex, weight

male,50
female,49
female,48
male,66
male,78
female,98
male,74
male,54
female,65

NR是9个，但实际上，我想要一种男性的NR，是女性的5和

sex= $(f["sex"])   
ccWeight[sex] += $(f["weight"])
avgWeight = ccWeight[sex] / ¿?

。：我现在不需要打印结果，而只是将此数字存储在变量上。

原文

I have a variable which splits the results of a column based on a condition (group by in others programming languages).

I'm trying to have a variable that counts the NR of each group. If we sum all the groups we should have the NR of the file.

When I try to use NR in the calculation for example NR[variable that splits], I get a fatal error "you tried to use scalar as matrix.

Any ideas how to use NR as a variable, but not counting all the records, only those from each group?

sex, weight

male,50
female,49
female,48
male,66
male,78
female,98
male,74
male,54
female,65

In this case the NR would be 9 BUT, in reality I want a way to get that NR of male is 5 and 4 for female.

I have the total sum of weigth column but struggle to get the avg:

sex= $(f["sex"])   
ccWeight[sex] += $(f["weight"])
avgWeight = ccWeight[sex] / ¿?

Important: I don't need to print the result as of now, just to store this number on a variable.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

装迷糊 2025-02-01 08:06:24

一个awk想法：

awk -F, '
NR>1 { counts[$1]++              # keep count of each distinct sex
       counts_total++            # replace dependency on NR
       weight[$1]+=$2            # keep sum of weights by sex
     }
END  { for (i in counts) {
           printf "%s: (count) %s of %s (%.2f%)\n",i,counts[i],counts_total,(counts[i]/counts_total*100)
           printf "%s: (avg weight) %.2f ( %s / %s )\n",i,(weight[i]/counts[i]),weight[i],counts[i]
       }
     }
' sample.dat

注意：

op可以添加其他代码以验证总数和权重并不为零（以免生成“按零”错误生成“分隔”
）如果没有（Fe）男性记录要处理？

这会生成：

female: (count) 4 of 9 (44.44%)
female: (avg weight) 65.00 ( 260 / 4 )
male: (count) 5 of 9 (55.56%)
male: (avg weight) 64.40 ( 322 / 5 )

One awk idea:

awk -F, '
NR>1 { counts[$1]++              # keep count of each distinct sex
       counts_total++            # replace dependency on NR
       weight[$1]+=$2            # keep sum of weights by sex
     }
END  { for (i in counts) {
           printf "%s: (count) %s of %s (%.2f%)\n",i,counts[i],counts_total,(counts[i]/counts_total*100)
           printf "%s: (avg weight) %.2f ( %s / %s )\n",i,(weight[i]/counts[i]),weight[i],counts[i]
       }
     }
' sample.dat

NOTE:

OP can add additional code to verify total counts and weights are not zero (so as to keep from generating a 'divide by zero' error)
perhaps print a different message if there are no (fe)male records to process?

This generates:

female: (count) 4 of 9 (44.44%)
female: (avg weight) 65.00 ( 260 / 4 )
male: (count) 5 of 9 (55.56%)
male: (avg weight) 64.40 ( 322 / 5 )

回复收藏 0 原文

安人多梦 2025-02-01 08:06:24

gnu datamash可能是您要寻找的东西，例如：

<infile datamash -Hst, groupby 1 count 1 sum 2 mean 2 | column -s, -t

输出：

GroupBy(sex)  count(sex)  sum(weight)  mean(weight)
female        4           260          65
male          5           322          64.4

GNU datamash might be what you are looking for, e.g.:

<infile datamash -Hst, groupby 1 count 1 sum 2 mean 2 | column -s, -t

Output:

GroupBy(sex)  count(sex)  sum(weight)  mean(weight)
female        4           260          65
male          5           322          64.4

回复收藏 0 原文

~没有更多了~

关于作者

海夕

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

尴尬的数值分组列记录总数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

尴尬的数值分组列记录总数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。