计算 C++ 中样本向量的平均值和标准差使用升压

发布于 2024-12-07 18:44:37 字数 161 浏览 0 评论 0原文

有没有办法使用 Boost 计算包含样本的向量的平均值和标准差

或者我是否必须创建一个累加器并将向量输入其中?

Is there a way to calculate mean and standard deviation for a vector containing samples using Boost?

Or do I have to create an accumulator and feed the vector into it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

信仰 2024-12-14 18:44:37

我不知道Boost是否有更具体的功能,但是你可以用标准库来做。

给定 std::vectorv,这是一种简单的方法:

#include <numeric>

double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();

double sq_sum = std::inner_product(v.begin(), v.end(), v.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size() - mean * mean);

对于巨大或微小的值,这很容易发生溢出或下溢。计算标准差的一个稍微好一点的方法是:

double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();

std::vector<double> diff(v.size());
std::transform(v.begin(), v.end(), diff.begin(),
               std::bind2nd(std::minus<double>(), mean));
double sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size());

UPDATE for C++11:

可以使用 lambda 函数而不是 编写对 std::transform 的调用>std::minusstd::bind2nd(现已弃用):

std::transform(v.begin(), v.end(), diff.begin(), [mean](double x) { return x - mean; });

I don't know if Boost has more specific functions, but you can do it with the standard library.

Given std::vector<double> v, this is the naive way:

#include <numeric>

double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();

double sq_sum = std::inner_product(v.begin(), v.end(), v.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size() - mean * mean);

This is susceptible to overflow or underflow for huge or tiny values. A slightly better way to calculate the standard deviation is:

double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();

std::vector<double> diff(v.size());
std::transform(v.begin(), v.end(), diff.begin(),
               std::bind2nd(std::minus<double>(), mean));
double sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size());

UPDATE for C++11:

The call to std::transform can be written using a lambda function instead of std::minus and std::bind2nd(now deprecated):

std::transform(v.begin(), v.end(), diff.begin(), [mean](double x) { return x - mean; });
肥爪爪 2024-12-14 18:44:37

如果性能对您很重要,并且您的编译器支持 lambda,则 stdev 计算可以更快更简单:在 VS 2012 的测试中,我发现以下代码比所选答案中给出的 Boost 代码快 10 倍以上;它也比使用 musiphil 提供的标准库的更安全版本的答案快 5 倍。

请注意,我使用的是样本标准差,因此下面的代码给出的结果略有不同(为什么有标准差负一)

double sum = std::accumulate(std::begin(v), std::end(v), 0.0);
double m =  sum / v.size();

double accum = 0.0;
std::for_each (std::begin(v), std::end(v), [&](const double d) {
    accum += (d - m) * (d - m);
});

double stdev = sqrt(accum / (v.size()-1));

If performance is important to you, and your compiler supports lambdas, the stdev calculation can be made faster and simpler: In tests with VS 2012 I've found that the following code is over 10 X quicker than the Boost code given in the chosen answer; it's also 5 X quicker than the safer version of the answer using standard libraries given by musiphil.

Note I'm using sample standard deviation, so the below code gives slightly different results (Why there is a Minus One in Standard Deviations)

double sum = std::accumulate(std::begin(v), std::end(v), 0.0);
double m =  sum / v.size();

double accum = 0.0;
std::for_each (std::begin(v), std::end(v), [&](const double d) {
    accum += (d - m) * (d - m);
});

double stdev = sqrt(accum / (v.size()-1));
自由范儿 2024-12-14 18:44:37

使用累加器计算提升

accumulator_set<double, stats<tag::variance> > acc;
for_each(a_vec.begin(), a_vec.end(), bind<void>(ref(acc), _1));

cout << mean(acc) << endl;
cout << sqrt(variance(acc)) << endl;

 

Using accumulators is the way to compute means and standard deviations in Boost.

accumulator_set<double, stats<tag::variance> > acc;
for_each(a_vec.begin(), a_vec.end(), bind<void>(ref(acc), _1));

cout << mean(acc) << endl;
cout << sqrt(variance(acc)) << endl;

 

是你 2024-12-14 18:44:37

改进musiphil的答案,您可以编写一个没有临时向量diff的标准差函数,仅使用具有 C++11 lambda 功能的单个 inner_product 调用:

double stddev(std::vector<double> const & func)
{
    double mean = std::accumulate(func.begin(), func.end(), 0.0) / func.size();
    double sq_sum = std::inner_product(func.begin(), func.end(), func.begin(), 0.0,
        [](double const & x, double const & y) { return x + y; },
        [mean](double const & x, double const & y) { return (x - mean)*(y - mean); });
    return std::sqrt(sq_sum / func.size() - 1);
}

我怀疑多次执行减法比使用额外的中间存储更便宜,而且我认为它更具可读性,但我还没有测试过性能。

至于为什么使用 N-1 的解释(如 func.size() - 1),请参阅 这些 问题 - 请注意如何问题表明我们有一个“包含样本的向量”。

Improving on the answer by musiphil, you can write a standard deviation function without the temporary vector diff, just using a single inner_product call with the C++11 lambda capabilities:

double stddev(std::vector<double> const & func)
{
    double mean = std::accumulate(func.begin(), func.end(), 0.0) / func.size();
    double sq_sum = std::inner_product(func.begin(), func.end(), func.begin(), 0.0,
        [](double const & x, double const & y) { return x + y; },
        [mean](double const & x, double const & y) { return (x - mean)*(y - mean); });
    return std::sqrt(sq_sum / func.size() - 1);
}

I suspect doing the subtraction multiple times is cheaper than using up additional intermediate storage, and I think it is more readable, but I haven't tested the performance yet.

As for explanation why to use N-1 (as in func.size() - 1), see these questions - note how the question states we have a "vector containing samples".

年华零落成诗 2024-12-14 18:44:37

似乎以下优雅的递归解决方案尚未被提及,尽管它已经存在很长时间了。参考 Knuth 的《计算机编程艺术》,

mean_1 = x_1, variance_1 = 0;            //initial conditions; edge case;

//for k >= 2, 
mean_k     = mean_k-1 + (x_k - mean_k-1) / k;
variance_k = variance_k-1 + (x_k - mean_k-1) * (x_k - mean_k);

对于 n>=2 值的列表,标准差的估计为:

stddev = std::sqrt(variance_n / (n-1)). 

希望这有帮助!

It seems the following elegant recursive solution has not been mentioned, although it has been around for a long time. Referring to Knuth's Art of Computer Programming,

mean_1 = x_1, variance_1 = 0;            //initial conditions; edge case;

//for k >= 2, 
mean_k     = mean_k-1 + (x_k - mean_k-1) / k;
variance_k = variance_k-1 + (x_k - mean_k-1) * (x_k - mean_k);

then for a list of n>=2 values, the estimate of the standard deviation is:

stddev = std::sqrt(variance_n / (n-1)). 

Hope this helps!

树深时见影 2024-12-14 18:44:37

我的答案与 Josh Greifer 类似,但概括为样本协方差。样本方差只是样本协方差,但两个输入相同。这包括贝塞尔相关性。

    template <class Iter> typename Iter::value_type cov(const Iter &x, const Iter &y)
    {
        double sum_x = std::accumulate(std::begin(x), std::end(x), 0.0);
        double sum_y = std::accumulate(std::begin(y), std::end(y), 0.0);

        double mx =  sum_x / x.size();
        double my =  sum_y / y.size();

        double accum = 0.0;

        for (auto i = 0; i < x.size(); i++)
        {
            accum += (x.at(i) - mx) * (y.at(i) - my);
        }

        return accum / (x.size() - 1);
    }

My answer is similar as Josh Greifer but generalised to sample covariance. Sample variance is just sample covariance but with the two inputs identical. This includes Bessel's correlation.

    template <class Iter> typename Iter::value_type cov(const Iter &x, const Iter &y)
    {
        double sum_x = std::accumulate(std::begin(x), std::end(x), 0.0);
        double sum_y = std::accumulate(std::begin(y), std::end(y), 0.0);

        double mx =  sum_x / x.size();
        double my =  sum_y / y.size();

        double accum = 0.0;

        for (auto i = 0; i < x.size(); i++)
        {
            accum += (x.at(i) - mx) * (y.at(i) - my);
        }

        return accum / (x.size() - 1);
    }
那伤。 2024-12-14 18:44:37

比之前提到的版本快 2 倍 - 主要是因为transform() 和inner_product() 循环被连接。
对我的快捷方式/typedefs/宏感到抱歉:Flo = float。 CR 常量参考。 VFlo - 向量。在VS2010中测试

#define fe(EL, CONTAINER)   for each (auto EL in CONTAINER)  //VS2010
Flo stdDev(VFlo CR crVec) {
    SZ  n = crVec.size();               if (n < 2) return 0.0f;
    Flo fSqSum = 0.0f, fSum = 0.0f;
    fe(f, crVec) fSqSum += f * f;       // EDIT: was Cit(VFlo, crVec) {
    fe(f, crVec) fSum   += f;
    Flo fSumSq      = fSum * fSum;
    Flo fSumSqDivN  = fSumSq / n;
    Flo fSubSqSum   = fSqSum - fSumSqDivN;
    Flo fPreSqrt    = fSubSqSum / (n - 1);
    return sqrt(fPreSqrt);
}

2x faster than the versions before mentioned - mostly because transform() and inner_product() loops are joined.
Sorry about my shortcut/typedefs/macro: Flo = float. CR const ref. VFlo - vector. Tested in VS2010

#define fe(EL, CONTAINER)   for each (auto EL in CONTAINER)  //VS2010
Flo stdDev(VFlo CR crVec) {
    SZ  n = crVec.size();               if (n < 2) return 0.0f;
    Flo fSqSum = 0.0f, fSum = 0.0f;
    fe(f, crVec) fSqSum += f * f;       // EDIT: was Cit(VFlo, crVec) {
    fe(f, crVec) fSum   += f;
    Flo fSumSq      = fSum * fSum;
    Flo fSumSqDivN  = fSumSq / n;
    Flo fSubSqSum   = fSqSum - fSumSqDivN;
    Flo fPreSqrt    = fSubSqSum / (n - 1);
    return sqrt(fPreSqrt);
}
长不大的小祸害 2024-12-14 18:44:37

为了以更好的精度计算样本均值,可以使用以下 r 步递归:

mean_k=1/k*[(kr)*mean_(kr) + sum_over_i_from_(n-r+1)_to_n(x_i)] ,

其中选择 r 是为了使求和分量彼此更接近。

In order to calculate the sample mean with a better presicion the following r-step recursion can be used:

mean_k=1/k*[(k-r)*mean_(k-r) + sum_over_i_from_(n-r+1)_to_n(x_i)],

where r is chosen to make summation components closer to each other.

人│生佛魔见 2024-12-14 18:44:37

创建您自己的容器:

template <class T>
class statList : public std::list<T>
{
    public:
        statList() : std::list<T>::list() {}
        ~statList() {}
        T mean() {
           return accumulate(begin(),end(),0.0)/size();
        }
        T stddev() {
           T diff_sum = 0;
           T m = mean();
           for(iterator it= begin(); it != end(); ++it)
               diff_sum += ((*it - m)*(*it -m));
           return diff_sum/size();
        }
};

它确实有一些限制,但当您知道自己在做什么时,它会很好地工作。

Create your own container:

template <class T>
class statList : public std::list<T>
{
    public:
        statList() : std::list<T>::list() {}
        ~statList() {}
        T mean() {
           return accumulate(begin(),end(),0.0)/size();
        }
        T stddev() {
           T diff_sum = 0;
           T m = mean();
           for(iterator it= begin(); it != end(); ++it)
               diff_sum += ((*it - m)*(*it -m));
           return diff_sum/size();
        }
};

It does have some limitations, but it works beautifully when you know what you are doing.

甩你一脸翔 2024-12-14 18:44:37

//c++ 中的意思是偏差

/偏差是观测值与感兴趣量(例如总体平均值)的真实值之间的差异,偏差是误差,偏差是观测值之间的差异真实值的估计(这样的估计可能是样本平均值)是残差。这些概念适用于测量间隔和比率级别的数据。/

#include <iostream>
#include <conio.h>
using namespace std;

/* run this program using the console pauser or add your own getch,     system("pause") or input loop */

int main(int argc, char** argv)
{
int i,cnt;
cout<<"please inter count:\t";
cin>>cnt;
float *num=new float [cnt];
float   *s=new float [cnt];
float sum=0,ave,M,M_D;

for(i=0;i<cnt;i++)
{
    cin>>num[i];
    sum+=num[i];    
}
ave=sum/cnt;
for(i=0;i<cnt;i++)
{
s[i]=ave-num[i];    
if(s[i]<0)
{
s[i]=s[i]*(-1); 
}
cout<<"\n|ave - number| = "<<s[i];  
M+=s[i];    
}
M_D=M/cnt;
cout<<"\n\n Average:             "<<ave;
cout<<"\n M.D(Mean Deviation): "<<M_D;
getch();
return 0;

}

//means deviation in c++

/A deviation that is a difference between an observed value and the true value of a quantity of interest (such as a population mean) is an error and a deviation that is the difference between the observed value and an estimate of the true value (such an estimate may be a sample mean) is a residual. These concepts are applicable for data at the interval and ratio levels of measurement./

#include <iostream>
#include <conio.h>
using namespace std;

/* run this program using the console pauser or add your own getch,     system("pause") or input loop */

int main(int argc, char** argv)
{
int i,cnt;
cout<<"please inter count:\t";
cin>>cnt;
float *num=new float [cnt];
float   *s=new float [cnt];
float sum=0,ave,M,M_D;

for(i=0;i<cnt;i++)
{
    cin>>num[i];
    sum+=num[i];    
}
ave=sum/cnt;
for(i=0;i<cnt;i++)
{
s[i]=ave-num[i];    
if(s[i]<0)
{
s[i]=s[i]*(-1); 
}
cout<<"\n|ave - number| = "<<s[i];  
M+=s[i];    
}
M_D=M/cnt;
cout<<"\n\n Average:             "<<ave;
cout<<"\n M.D(Mean Deviation): "<<M_D;
getch();
return 0;

}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文