计算 C++ 中样本向量的平均值和标准差使用升压
有没有办法使用 Boost 计算包含样本的向量的平均值和标准差?
或者我是否必须创建一个累加器并将向量输入其中?
Is there a way to calculate mean and standard deviation for a vector containing samples using Boost?
Or do I have to create an accumulator and feed the vector into it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
我不知道Boost是否有更具体的功能,但是你可以用标准库来做。
给定 std::vectorv,这是一种简单的方法:
对于巨大或微小的值,这很容易发生溢出或下溢。计算标准差的一个稍微好一点的方法是:
UPDATE for C++11:
可以使用 lambda 函数而不是
编写对
和std::transform
的调用>std::minusstd::bind2nd
(现已弃用):I don't know if Boost has more specific functions, but you can do it with the standard library.
Given
std::vector<double> v
, this is the naive way:This is susceptible to overflow or underflow for huge or tiny values. A slightly better way to calculate the standard deviation is:
UPDATE for C++11:
The call to
std::transform
can be written using a lambda function instead ofstd::minus
andstd::bind2nd
(now deprecated):如果性能对您很重要,并且您的编译器支持 lambda,则 stdev 计算可以更快更简单:在 VS 2012 的测试中,我发现以下代码比所选答案中给出的 Boost 代码快 10 倍以上;它也比使用 musiphil 提供的标准库的更安全版本的答案快 5 倍。
请注意,我使用的是样本标准差,因此下面的代码给出的结果略有不同(为什么有标准差负一)
If performance is important to you, and your compiler supports lambdas, the stdev calculation can be made faster and simpler: In tests with VS 2012 I've found that the following code is over 10 X quicker than the Boost code given in the chosen answer; it's also 5 X quicker than the safer version of the answer using standard libraries given by musiphil.
Note I'm using sample standard deviation, so the below code gives slightly different results (Why there is a Minus One in Standard Deviations)
使用累加器是计算提升。
Using accumulators is the way to compute means and standard deviations in Boost.
改进musiphil的答案,您可以编写一个没有临时向量
diff
的标准差函数,仅使用具有 C++11 lambda 功能的单个inner_product
调用:我怀疑多次执行减法比使用额外的中间存储更便宜,而且我认为它更具可读性,但我还没有测试过性能。
至于为什么使用 N-1 的解释(如
func.size() - 1
),请参阅 这些 问题 - 请注意如何问题表明我们有一个“包含样本的向量”。Improving on the answer by musiphil, you can write a standard deviation function without the temporary vector
diff
, just using a singleinner_product
call with the C++11 lambda capabilities:I suspect doing the subtraction multiple times is cheaper than using up additional intermediate storage, and I think it is more readable, but I haven't tested the performance yet.
As for explanation why to use N-1 (as in
func.size() - 1
), see these questions - note how the question states we have a "vector containing samples".似乎以下优雅的递归解决方案尚未被提及,尽管它已经存在很长时间了。参考 Knuth 的《计算机编程艺术》,
对于
n>=2
值的列表,标准差的估计为:希望这有帮助!
It seems the following elegant recursive solution has not been mentioned, although it has been around for a long time. Referring to Knuth's Art of Computer Programming,
then for a list of
n>=2
values, the estimate of the standard deviation is:Hope this helps!
我的答案与 Josh Greifer 类似,但概括为样本协方差。样本方差只是样本协方差,但两个输入相同。这包括贝塞尔相关性。
My answer is similar as Josh Greifer but generalised to sample covariance. Sample variance is just sample covariance but with the two inputs identical. This includes Bessel's correlation.
比之前提到的版本快 2 倍 - 主要是因为transform() 和inner_product() 循环被连接。
对我的快捷方式/typedefs/宏感到抱歉:Flo = float。 CR 常量参考。 VFlo - 向量。在VS2010中测试
2x faster than the versions before mentioned - mostly because transform() and inner_product() loops are joined.
Sorry about my shortcut/typedefs/macro: Flo = float. CR const ref. VFlo - vector. Tested in VS2010
为了以更好的精度计算样本均值,可以使用以下 r 步递归:
mean_k=1/k*[(kr)*mean_(kr) + sum_over_i_from_(n-r+1)_to_n(x_i)] ,
其中选择 r 是为了使求和分量彼此更接近。
In order to calculate the sample mean with a better presicion the following r-step recursion can be used:
mean_k=1/k*[(k-r)*mean_(k-r) + sum_over_i_from_(n-r+1)_to_n(x_i)],
where r is chosen to make summation components closer to each other.
创建您自己的容器:
它确实有一些限制,但当您知道自己在做什么时,它会很好地工作。
Create your own container:
It does have some limitations, but it works beautifully when you know what you are doing.
//c++ 中的意思是偏差
/偏差是观测值与感兴趣量(例如总体平均值)的真实值之间的差异,偏差是误差,偏差是观测值之间的差异真实值的估计(这样的估计可能是样本平均值)是残差。这些概念适用于测量间隔和比率级别的数据。/
}
//means deviation in c++
/A deviation that is a difference between an observed value and the true value of a quantity of interest (such as a population mean) is an error and a deviation that is the difference between the observed value and an estimate of the true value (such an estimate may be a sample mean) is a residual. These concepts are applicable for data at the interval and ratio levels of measurement./
}