与Bash中多个文件的标准偏差
我希望计算出标题为“ res_number.cs”的一系列文件范围的标准偏差,这些文件以CSV格式进行。示例数据包括
1,M,CA,54.9130
1,M,CA,54.9531
1,M,CA,54.8845
1,M,CA,54.7517
1,M,CA,54.8425
1,M,CA,55.2648
1,M,CA,55.0876
平均值
#!/bin/bash
files=`ls res*.cs`
for f in $files; do
echo "$f"
echo " "
#Count number of lines N
lines=`cat $f | wc -l`
#Sum Total
sum=`cat $f | awk -F "," '{print $4}' | paste -sd+ | bc`
#Mean
mean=`echo "scale=5 ; $sum / $lines" | bc`
echo "$mean"
echo " "
我已经计算了我想计算每个文件的标准偏差的 。我知道标准偏差公式是
S.D=sqrt((1/N)*(sum of (value - mean)^2))
,但我不确定如何将其实施到脚本中。
I wish to calculate the standard deviation from a range of files titled "res_NUMBER.cs" which are formatted as a CSV. Example data includes
1,M,CA,54.9130
1,M,CA,54.9531
1,M,CA,54.8845
1,M,CA,54.7517
1,M,CA,54.8425
1,M,CA,55.2648
1,M,CA,55.0876
I have calculated the mean using
#!/bin/bash
files=`ls res*.cs`
for f in $files; do
echo "$f"
echo " "
#Count number of lines N
lines=`cat $f | wc -l`
#Sum Total
sum=`cat $f | awk -F "," '{print $4}' | paste -sd+ | bc`
#Mean
mean=`echo "scale=5 ; $sum / $lines" | bc`
echo "$mean"
echo " "
I would like to calculate the standard deviation across each file. I understand that the standard deviation formula is
S.D=sqrt((1/N)*(sum of (value - mean)^2))
But I am unsure how I would implement this into my script.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
awk
足够强大,可以轻松地计算一个文件的平均值以添加标准偏差(不是您的公式是人口,而不是用于示例,这是我在这里复制的),
这使用了stddev = sqrt (var(x))= sqrt(e(x^2) - e(x)^2)
它具有较差的数值准确性(因为将值而不是差异),但是如果您的值较低,则可以正常工作。
然后,最简单的是在for循环中使用它,以使文件
以该顺序运行res1.cs .. res37.cs,最简单的是更改for循环
,该循环将以指定的数值顺序扩展。
awk
is powerful enough to calculate the mean of one file easilyto add standard deviation (not that your formula is for population, not for sample, that's what I replicate here)
this uses the fact that stddev = sqrt(Var(x)) = sqrt( E(x^2) - E(x)^2 )
which has worse numerical accuracy (since squaring the values instead of diff) but works fine if your values have low bounds.
The simplest is then using this in a for loop for the files
to run res1.cs .. res37.cs in that order, easiest is change the for loop
which will expand in the numerical order specified.