与Bash中多个文件的标准偏差

发布于 2025-01-28 10:33:44 字数 727 浏览 2 评论 0原文

我希望计算出标题为“ res_number.cs”的一系列文件范围的标准偏差,这些文件以CSV格式进行。示例数据包括

1,M,CA,54.9130  
1,M,CA,54.9531  
1,M,CA,54.8845  
1,M,CA,54.7517  
1,M,CA,54.8425  
1,M,CA,55.2648  
1,M,CA,55.0876 

平均值

#!/bin/bash


files=`ls res*.cs`  
for f in $files; do 
        echo "$f" 
        echo " " 
        #Count number of lines N 
        lines=`cat $f | wc -l` 
        #Sum Total 
        sum=`cat $f | awk -F "," '{print $4}' | paste -sd+ | bc` 
        #Mean 
        mean=`echo "scale=5 ; $sum / $lines" | bc` 
        echo "$mean" 
        echo " "

我已经计算了我想计算每个文件的标准偏差的 。我知道标准偏差公式是

S.D=sqrt((1/N)*(sum of (value - mean)^2))

,但我不确定如何将其实施到脚本中。

I wish to calculate the standard deviation from a range of files titled "res_NUMBER.cs" which are formatted as a CSV. Example data includes

1,M,CA,54.9130  
1,M,CA,54.9531  
1,M,CA,54.8845  
1,M,CA,54.7517  
1,M,CA,54.8425  
1,M,CA,55.2648  
1,M,CA,55.0876 

I have calculated the mean using

#!/bin/bash


files=`ls res*.cs`  
for f in $files; do 
        echo "$f" 
        echo " " 
        #Count number of lines N 
        lines=`cat $f | wc -l` 
        #Sum Total 
        sum=`cat $f | awk -F "," '{print $4}' | paste -sd+ | bc` 
        #Mean 
        mean=`echo "scale=5 ; $sum / $lines" | bc` 
        echo "$mean" 
        echo " "

I would like to calculate the standard deviation across each file. I understand that the standard deviation formula is

S.D=sqrt((1/N)*(sum of (value - mean)^2))

But I am unsure how I would implement this into my script.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

2025-02-04 10:33:44

awk足够强大,可以轻松地计算一个文件的平均值

$ awk -F, '{sum+=$4} END{print sum/NR}' file

以添加标准偏差(不是您的公式是人口,而不是用于示例,这是我在这里复制的),

 $ awk -F, '{sum+=$4; ss+=$4^2} END{print m=sum/NR,sqrt(ss/NR-m^2)}' file
 54.9567 0.15778

这使用了stddev = sqrt (var(x))= sqrt(e(x^2) - e(x)^2)
它具有较差的数值准确性(因为将值而不是差异),但是如果您的值较低,则可以正常工作。

然后,最简单的是在for循环中使用它,以使文件

for f in res*.cs
do 
    awk -F, '{sum+=$4; ss+=$4^2} 
         END {print FILENAME; 
              print "mean:", m=sum/NR, "stddev:", sqrt(ss/NR-m^2)}' "$f"
end

以该顺序运行res1.cs .. res37.cs,最简单的是更改for循环

for f in res{1..37}.cs
# the rest of the code not changed.

,该循环将以指定的数值顺序扩展。

awk is powerful enough to calculate the mean of one file easily

$ awk -F, '{sum+=$4} END{print sum/NR}' file

to add standard deviation (not that your formula is for population, not for sample, that's what I replicate here)

 $ awk -F, '{sum+=$4; ss+=$4^2} END{print m=sum/NR,sqrt(ss/NR-m^2)}' file
 54.9567 0.15778

this uses the fact that stddev = sqrt(Var(x)) = sqrt( E(x^2) - E(x)^2 )
which has worse numerical accuracy (since squaring the values instead of diff) but works fine if your values have low bounds.

The simplest is then using this in a for loop for the files

for f in res*.cs
do 
    awk -F, '{sum+=$4; ss+=$4^2} 
         END {print FILENAME; 
              print "mean:", m=sum/NR, "stddev:", sqrt(ss/NR-m^2)}' "$f"
end

to run res1.cs .. res37.cs in that order, easiest is change the for loop

for f in res{1..37}.cs
# the rest of the code not changed.

which will expand in the numerical order specified.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文