与Bash中多个文件的标准偏差

发布于 2025-01-28 10:33:44 字数 727 浏览 2 评论 0原文

我希望计算出标题为“ res_number.cs”的一系列文件范围的标准偏差，这些文件以CSV格式进行。示例数据包括

1,M,CA,54.9130  
1,M,CA,54.9531  
1,M,CA,54.8845  
1,M,CA,54.7517  
1,M,CA,54.8425  
1,M,CA,55.2648  
1,M,CA,55.0876

平均值

#!/bin/bash


files=`ls res*.cs`  
for f in $files; do 
        echo "$f" 
        echo " " 
        #Count number of lines N 
        lines=`cat $f | wc -l` 
        #Sum Total 
        sum=`cat $f | awk -F "," '{print $4}' | paste -sd+ | bc` 
        #Mean 
        mean=`echo "scale=5 ; $sum / $lines" | bc` 
        echo "$mean" 
        echo " "

我已经计算了我想计算每个文件的标准偏差的。我知道标准偏差公式是

S.D=sqrt((1/N)*(sum of (value - mean)^2))

，但我不确定如何将其实施到脚本中。

原文

I wish to calculate the standard deviation from a range of files titled "res_NUMBER.cs" which are formatted as a CSV. Example data includes

1,M,CA,54.9130  
1,M,CA,54.9531  
1,M,CA,54.8845  
1,M,CA,54.7517  
1,M,CA,54.8425  
1,M,CA,55.2648  
1,M,CA,55.0876

I have calculated the mean using

#!/bin/bash


files=`ls res*.cs`  
for f in $files; do 
        echo "$f" 
        echo " " 
        #Count number of lines N 
        lines=`cat $f | wc -l` 
        #Sum Total 
        sum=`cat $f | awk -F "," '{print $4}' | paste -sd+ | bc` 
        #Mean 
        mean=`echo "scale=5 ; $sum / $lines" | bc` 
        echo "$mean" 
        echo " "

I would like to calculate the standard deviation across each file. I understand that the standard deviation formula is

S.D=sqrt((1/N)*(sum of (value - mean)^2))

But I am unsure how I would implement this into my script.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

丧 2025-02-04 10:33:44

awk足够强大，可以轻松地计算一个文件的平均值

$ awk -F, '{sum+=$4} END{print sum/NR}' file

以添加标准偏差（不是您的公式是人口，而不是用于示例，这是我在这里复制的），

 $ awk -F, '{sum+=$4; ss+=$4^2} END{print m=sum/NR,sqrt(ss/NR-m^2)}' file
 54.9567 0.15778

这使用了stddev = sqrt （var（x））= sqrt（e（x^2） - e（x）^2）
它具有较差的数值准确性（因为将值而不是差异），但是如果您的值较低，则可以正常工作。

然后，最简单的是在for循环中使用它，以使文件

for f in res*.cs
do 
    awk -F, '{sum+=$4; ss+=$4^2} 
         END {print FILENAME; 
              print "mean:", m=sum/NR, "stddev:", sqrt(ss/NR-m^2)}' "$f"
end

以该顺序运行res1.cs .. res37.cs，最简单的是更改for循环

for f in res{1..37}.cs
# the rest of the code not changed.

，该循环将以指定的数值顺序扩展。

awk is powerful enough to calculate the mean of one file easily

$ awk -F, '{sum+=$4} END{print sum/NR}' file

to add standard deviation (not that your formula is for population, not for sample, that's what I replicate here)

 $ awk -F, '{sum+=$4; ss+=$4^2} END{print m=sum/NR,sqrt(ss/NR-m^2)}' file
 54.9567 0.15778

this uses the fact that stddev = sqrt(Var(x)) = sqrt( E(x^2) - E(x)^2 )
which has worse numerical accuracy (since squaring the values instead of diff) but works fine if your values have low bounds.

The simplest is then using this in a for loop for the files

for f in res*.cs
do 
    awk -F, '{sum+=$4; ss+=$4^2} 
         END {print FILENAME; 
              print "mean:", m=sum/NR, "stddev:", sqrt(ss/NR-m^2)}' "$f"
end

to run res1.cs .. res37.cs in that order, easiest is change the for loop

for f in res{1..37}.cs
# the rest of the code not changed.

which will expand in the numerical order specified.

回复收藏 0 原文

~没有更多了~

关于作者

她比我温柔

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

与Bash中多个文件的标准偏差

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

你列表最软的妹

晚安先生.

究竟谁懂我的在乎

mmi23

梦中的蝴蝶

skjfmsvd

友情链接

与Bash中多个文件的标准偏差

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

你列表最软的妹

晚安先生.

究竟谁懂我的在乎

mmi23

梦中的蝴蝶

skjfmsvd

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。