当前位置：文江博客话题详情

计算 C++ 中样本向量的平均值和标准差使用升压

发布于 2024-12-07 18:44:37 字数 161 浏览 0 评论 0原文

有没有办法使用 Boost 计算包含样本的向量的平均值和标准差？

或者我是否必须创建一个累加器并将向量输入其中？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

信仰 2024-12-14 18:44:37

我不知道Boost是否有更具体的功能，但是你可以用标准库来做。

给定 std::vectorv，这是一种简单的方法：

#include <numeric>

double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();

double sq_sum = std::inner_product(v.begin(), v.end(), v.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size() - mean * mean);

对于巨大或微小的值，这很容易发生溢出或下溢。计算标准差的一个稍微好一点的方法是：

double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();

std::vector<double> diff(v.size());
std::transform(v.begin(), v.end(), diff.begin(),
               std::bind2nd(std::minus<double>(), mean));
double sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size());

UPDATE for C++11：

可以使用 lambda 函数而不是 编写对 std::transform 的调用>std::minus 和 std::bind2nd（现已弃用）：

std::transform(v.begin(), v.end(), diff.begin(), [mean](double x) { return x - mean; });

I don't know if Boost has more specific functions, but you can do it with the standard library.

Given std::vector<double> v, this is the naive way:

#include <numeric>

double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();

double sq_sum = std::inner_product(v.begin(), v.end(), v.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size() - mean * mean);

This is susceptible to overflow or underflow for huge or tiny values. A slightly better way to calculate the standard deviation is:

double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();

std::vector<double> diff(v.size());
std::transform(v.begin(), v.end(), diff.begin(),
               std::bind2nd(std::minus<double>(), mean));
double sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size());

UPDATE for C++11:

The call to std::transform can be written using a lambda function instead of std::minus and std::bind2nd(now deprecated):

std::transform(v.begin(), v.end(), diff.begin(), [mean](double x) { return x - mean; });

回复收藏 0 原文

肥爪爪 2024-12-14 18:44:37

如果性能对您很重要，并且您的编译器支持 lambda，则 stdev 计算可以更快更简单：在 VS 2012 的测试中，我发现以下代码比所选答案中给出的 Boost 代码快 10 倍以上;它也比使用 musiphil 提供的标准库的更安全版本的答案快 5 倍。

请注意，我使用的是样本标准差，因此下面的代码给出的结果略有不同（为什么有标准差负一)

double sum = std::accumulate(std::begin(v), std::end(v), 0.0);
double m =  sum / v.size();

double accum = 0.0;
std::for_each (std::begin(v), std::end(v), [&](const double d) {
    accum += (d - m) * (d - m);
});

double stdev = sqrt(accum / (v.size()-1));

If performance is important to you, and your compiler supports lambdas, the stdev calculation can be made faster and simpler: In tests with VS 2012 I've found that the following code is over 10 X quicker than the Boost code given in the chosen answer; it's also 5 X quicker than the safer version of the answer using standard libraries given by musiphil.

Note I'm using sample standard deviation, so the below code gives slightly different results (Why there is a Minus One in Standard Deviations)

double sum = std::accumulate(std::begin(v), std::end(v), 0.0);
double m =  sum / v.size();

double accum = 0.0;
std::for_each (std::begin(v), std::end(v), [&](const double d) {
    accum += (d - m) * (d - m);
});

double stdev = sqrt(accum / (v.size()-1));

回复收藏 0 原文

自由范儿 2024-12-14 18:44:37

使用累加器是计算提升。

accumulator_set<double, stats<tag::variance> > acc;
for_each(a_vec.begin(), a_vec.end(), bind<void>(ref(acc), _1));

cout << mean(acc) << endl;
cout << sqrt(variance(acc)) << endl;

Using accumulators is the way to compute means and standard deviations in Boost.

accumulator_set<double, stats<tag::variance> > acc;
for_each(a_vec.begin(), a_vec.end(), bind<void>(ref(acc), _1));

cout << mean(acc) << endl;
cout << sqrt(variance(acc)) << endl;

回复收藏 0 原文

是你 2024-12-14 18:44:37

改进musiphil的答案，您可以编写一个没有临时向量diff的标准差函数，仅使用具有 C++11 lambda 功能的单个 inner_product 调用：

double stddev(std::vector<double> const & func)
{
    double mean = std::accumulate(func.begin(), func.end(), 0.0) / func.size();
    double sq_sum = std::inner_product(func.begin(), func.end(), func.begin(), 0.0,
        [](double const & x, double const & y) { return x + y; },
        [mean](double const & x, double const & y) { return (x - mean)*(y - mean); });
    return std::sqrt(sq_sum / func.size() - 1);
}

我怀疑多次执行减法比使用额外的中间存储更便宜，而且我认为它更具可读性，但我还没有测试过性能。

至于为什么使用 N-1 的解释（如 func.size() - 1），请参阅这些问题 - 请注意如何问题表明我们有一个“包含样本的向量”。

Improving on the answer by musiphil, you can write a standard deviation function without the temporary vector diff, just using a single inner_product call with the C++11 lambda capabilities:

double stddev(std::vector<double> const & func)
{
    double mean = std::accumulate(func.begin(), func.end(), 0.0) / func.size();
    double sq_sum = std::inner_product(func.begin(), func.end(), func.begin(), 0.0,
        [](double const & x, double const & y) { return x + y; },
        [mean](double const & x, double const & y) { return (x - mean)*(y - mean); });
    return std::sqrt(sq_sum / func.size() - 1);
}

I suspect doing the subtraction multiple times is cheaper than using up additional intermediate storage, and I think it is more readable, but I haven't tested the performance yet.

As for explanation why to use N-1 (as in func.size() - 1), see these questions - note how the question states we have a "vector containing samples".

回复收藏 0 原文

年华零落成诗 2024-12-14 18:44:37

似乎以下优雅的递归解决方案尚未被提及，尽管它已经存在很长时间了。参考 Knuth 的《计算机编程艺术》，

mean_1 = x_1, variance_1 = 0;            //initial conditions; edge case;

//for k >= 2, 
mean_k     = mean_k-1 + (x_k - mean_k-1) / k;
variance_k = variance_k-1 + (x_k - mean_k-1) * (x_k - mean_k);

对于 n>=2 值的列表，标准差的估计为：

stddev = std::sqrt(variance_n / (n-1)).

希望这有帮助！

It seems the following elegant recursive solution has not been mentioned, although it has been around for a long time. Referring to Knuth's Art of Computer Programming,

mean_1 = x_1, variance_1 = 0;            //initial conditions; edge case;

//for k >= 2, 
mean_k     = mean_k-1 + (x_k - mean_k-1) / k;
variance_k = variance_k-1 + (x_k - mean_k-1) * (x_k - mean_k);

then for a list of n>=2 values, the estimate of the standard deviation is:

stddev = std::sqrt(variance_n / (n-1)).

Hope this helps!

回复收藏 0 原文

树深时见影 2024-12-14 18:44:37

我的答案与 Josh Greifer 类似，但概括为样本协方差。样本方差只是样本协方差，但两个输入相同。这包括贝塞尔相关性。

    template <class Iter> typename Iter::value_type cov(const Iter &x, const Iter &y)
    {
        double sum_x = std::accumulate(std::begin(x), std::end(x), 0.0);
        double sum_y = std::accumulate(std::begin(y), std::end(y), 0.0);

        double mx =  sum_x / x.size();
        double my =  sum_y / y.size();

        double accum = 0.0;

        for (auto i = 0; i < x.size(); i++)
        {
            accum += (x.at(i) - mx) * (y.at(i) - my);
        }

        return accum / (x.size() - 1);
    }

My answer is similar as Josh Greifer but generalised to sample covariance. Sample variance is just sample covariance but with the two inputs identical. This includes Bessel's correlation.

    template <class Iter> typename Iter::value_type cov(const Iter &x, const Iter &y)
    {
        double sum_x = std::accumulate(std::begin(x), std::end(x), 0.0);
        double sum_y = std::accumulate(std::begin(y), std::end(y), 0.0);

        double mx =  sum_x / x.size();
        double my =  sum_y / y.size();

        double accum = 0.0;

        for (auto i = 0; i < x.size(); i++)
        {
            accum += (x.at(i) - mx) * (y.at(i) - my);
        }

        return accum / (x.size() - 1);
    }

回复收藏 0 原文

那伤。 2024-12-14 18:44:37

比之前提到的版本快 2 倍 - 主要是因为transform() 和inner_product() 循环被连接。
对我的快捷方式/typedefs/宏感到抱歉：Flo = float。 CR 常量参考。 VFlo - 向量。在VS2010中测试

#define fe(EL, CONTAINER)   for each (auto EL in CONTAINER)  //VS2010
Flo stdDev(VFlo CR crVec) {
    SZ  n = crVec.size();               if (n < 2) return 0.0f;
    Flo fSqSum = 0.0f, fSum = 0.0f;
    fe(f, crVec) fSqSum += f * f;       // EDIT: was Cit(VFlo, crVec) {
    fe(f, crVec) fSum   += f;
    Flo fSumSq      = fSum * fSum;
    Flo fSumSqDivN  = fSumSq / n;
    Flo fSubSqSum   = fSqSum - fSumSqDivN;
    Flo fPreSqrt    = fSubSqSum / (n - 1);
    return sqrt(fPreSqrt);
}

2x faster than the versions before mentioned - mostly because transform() and inner_product() loops are joined.
Sorry about my shortcut/typedefs/macro: Flo = float. CR const ref. VFlo - vector. Tested in VS2010

#define fe(EL, CONTAINER)   for each (auto EL in CONTAINER)  //VS2010
Flo stdDev(VFlo CR crVec) {
    SZ  n = crVec.size();               if (n < 2) return 0.0f;
    Flo fSqSum = 0.0f, fSum = 0.0f;
    fe(f, crVec) fSqSum += f * f;       // EDIT: was Cit(VFlo, crVec) {
    fe(f, crVec) fSum   += f;
    Flo fSumSq      = fSum * fSum;
    Flo fSumSqDivN  = fSumSq / n;
    Flo fSubSqSum   = fSqSum - fSumSqDivN;
    Flo fPreSqrt    = fSubSqSum / (n - 1);
    return sqrt(fPreSqrt);
}

回复收藏 0 原文

长不大的小祸害 2024-12-14 18:44:37

为了以更好的精度计算样本均值，可以使用以下 r 步递归：

mean_k=1/k*[(kr)*mean_(kr) + sum_over_i_from_(n-r+1)_to_n(x_i)] ，

其中选择 r 是为了使求和分量彼此更接近。

回复收藏 0 原文

人│生佛魔见 2024-12-14 18:44:37

创建您自己的容器：

template <class T>
class statList : public std::list<T>
{
    public:
        statList() : std::list<T>::list() {}
        ~statList() {}
        T mean() {
           return accumulate(begin(),end(),0.0)/size();
        }
        T stddev() {
           T diff_sum = 0;
           T m = mean();
           for(iterator it= begin(); it != end(); ++it)
               diff_sum += ((*it - m)*(*it -m));
           return diff_sum/size();
        }
};

它确实有一些限制，但当您知道自己在做什么时，它会很好地工作。

Create your own container:

template <class T>
class statList : public std::list<T>
{
    public:
        statList() : std::list<T>::list() {}
        ~statList() {}
        T mean() {
           return accumulate(begin(),end(),0.0)/size();
        }
        T stddev() {
           T diff_sum = 0;
           T m = mean();
           for(iterator it= begin(); it != end(); ++it)
               diff_sum += ((*it - m)*(*it -m));
           return diff_sum/size();
        }
};

It does have some limitations, but it works beautifully when you know what you are doing.

回复收藏 0 原文

甩你一脸翔 2024-12-14 18:44:37

//c++ 中的意思是偏差

/偏差是观测值与感兴趣量（例如总体平均值）的真实值之间的差异，偏差是误差，偏差是观测值之间的差异真实值的估计（这样的估计可能是样本平均值）是残差。这些概念适用于测量间隔和比率级别的数据。/

#include <iostream>
#include <conio.h>
using namespace std;

/* run this program using the console pauser or add your own getch,     system("pause") or input loop */

int main(int argc, char** argv)
{
int i,cnt;
cout<<"please inter count:\t";
cin>>cnt;
float *num=new float [cnt];
float   *s=new float [cnt];
float sum=0,ave,M,M_D;

for(i=0;i<cnt;i++)
{
    cin>>num[i];
    sum+=num[i];    
}
ave=sum/cnt;
for(i=0;i<cnt;i++)
{
s[i]=ave-num[i];    
if(s[i]<0)
{
s[i]=s[i]*(-1); 
}
cout<<"\n|ave - number| = "<<s[i];  
M+=s[i];    
}
M_D=M/cnt;
cout<<"\n\n Average:             "<<ave;
cout<<"\n M.D(Mean Deviation): "<<M_D;
getch();
return 0;

}

//means deviation in c++

/A deviation that is a difference between an observed value and the true value of a quantity of interest (such as a population mean) is an error and a deviation that is the difference between the observed value and an estimate of the true value (such an estimate may be a sample mean) is a residual. These concepts are applicable for data at the interval and ratio levels of measurement./

#include <iostream>
#include <conio.h>
using namespace std;

/* run this program using the console pauser or add your own getch,     system("pause") or input loop */

int main(int argc, char** argv)
{
int i,cnt;
cout<<"please inter count:\t";
cin>>cnt;
float *num=new float [cnt];
float   *s=new float [cnt];
float sum=0,ave,M,M_D;

for(i=0;i<cnt;i++)
{
    cin>>num[i];
    sum+=num[i];    
}
ave=sum/cnt;
for(i=0;i<cnt;i++)
{
s[i]=ave-num[i];    
if(s[i]<0)
{
s[i]=s[i]*(-1); 
}
cout<<"\n|ave - number| = "<<s[i];  
M+=s[i];    
}
M_D=M/cnt;
cout<<"\n\n Average:             "<<ave;
cout<<"\n M.D(Mean Deviation): "<<M_D;
getch();
return 0;

}

回复收藏 0 原文

~没有更多了~