通用列表的标准差？

发布于 2024-09-07 07:36:12 字数 1265 浏览 14 评论 0原文

我需要计算通用列表的标准差。我会尝试包含我的代码。它是一个包含数据的通用列表。数据主要是浮点数和整数。这是我的代码，与它相关，但没有详细介绍：

namespace ValveTesterInterface
{
    public class ValveDataResults
    {
        private List<ValveData> m_ValveResults;

        public ValveDataResults()
        {
            if (m_ValveResults == null)
            {
                m_ValveResults = new List<ValveData>();
            }
        }

        public void AddValveData(ValveData valve)
        {
            m_ValveResults.Add(valve);
        }

这是需要计算标准差的函数：

        public float LatchStdev()
        {

            float sumOfSqrs = 0;
            float meanValue = 0;
            foreach (ValveData value in m_ValveResults)
            {
                meanValue += value.LatchTime;
            }
            meanValue = (meanValue / m_ValveResults.Count) * 0.02f;

            for (int i = 0; i <= m_ValveResults.Count; i++) 
            {   
                sumOfSqrs += Math.Pow((m_ValveResults - meanValue), 2);  
            }
            return Math.Sqrt(sumOfSqrs /(m_ValveResults.Count - 1));

        }
    }
}

忽略 LatchStdev() 函数内部的内容，因为我确定它的值不对。这只是我计算标准偏差的糟糕尝试。我知道如何处理双精度列表，但不知道如何处理通用数据列表列表。如果有人有这方面的经验，请帮忙。

原文

I need to calculate the standard deviation of a generic list. I will try to include my code. Its a generic list with data in it. The data is mostly floats and ints. Here is my code that is relative to it without getting into to much detail:

namespace ValveTesterInterface
{
    public class ValveDataResults
    {
        private List<ValveData> m_ValveResults;

        public ValveDataResults()
        {
            if (m_ValveResults == null)
            {
                m_ValveResults = new List<ValveData>();
            }
        }

        public void AddValveData(ValveData valve)
        {
            m_ValveResults.Add(valve);
        }

Here is the function where the standard deviation needs to be calculated:

        public float LatchStdev()
        {

            float sumOfSqrs = 0;
            float meanValue = 0;
            foreach (ValveData value in m_ValveResults)
            {
                meanValue += value.LatchTime;
            }
            meanValue = (meanValue / m_ValveResults.Count) * 0.02f;

            for (int i = 0; i <= m_ValveResults.Count; i++) 
            {   
                sumOfSqrs += Math.Pow((m_ValveResults - meanValue), 2);  
            }
            return Math.Sqrt(sumOfSqrs /(m_ValveResults.Count - 1));

        }
    }
}

Ignore whats inside the LatchStdev() function because I'm sure its not right. Its just my poor attempt to calculate the st dev. I know how to do it of a list of doubles, however not of a list of generic data list. If someone had experience in this, please help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

深空失忆 2024-09-14 07:36:12

上面的示例稍有不正确，如果您的总体集为 1，则可能会出现除以零的错误。以下代码稍微简单一些，并给出“总体标准差”结果。（http://en.wikipedia.org/wiki/Standard_deviation）

using System;
using System.Linq;
using System.Collections.Generic;

public static class Extend
{
    public static double StandardDeviation(this IEnumerable<double> values)
    {
        double avg = values.Average();
        return Math.Sqrt(values.Average(v=>Math.Pow(v-avg,2)));
    }
}

The example above is slightly incorrect and could have a divide by zero error if your population set is 1. The following code is somewhat simpler and gives the "population standard deviation" result. (http://en.wikipedia.org/wiki/Standard_deviation)

using System;
using System.Linq;
using System.Collections.Generic;

public static class Extend
{
    public static double StandardDeviation(this IEnumerable<double> values)
    {
        double avg = values.Average();
        return Math.Sqrt(values.Average(v=>Math.Pow(v-avg,2)));
    }
}

回复收藏 0 原文

娇女薄笑 2024-09-14 07:36:12

这篇文章应该对您有帮助。它创建一个计算双精度值序列偏差的函数。您所要做的就是提供一系列适当的数据元素。

结果函数是：

private double CalculateStandardDeviation(IEnumerable<double> values)
{   
  double standardDeviation = 0;

  if (values.Any()) 
  {      
     // Compute the average.     
     double avg = values.Average();

     // Perform the Sum of (value-avg)_2_2.      
     double sum = values.Sum(d => Math.Pow(d - avg, 2));

     // Put it all together.      
     standardDeviation = Math.Sqrt((sum) / (values.Count()-1));   
  }  

  return standardDeviation;
}

只要我们为正在计算的值提供一个选择器，这很容易适应任何泛型类型。 LINQ 对此非常有用，Select 函数允许您从自定义类型的通用列表中投影一系列数值来计算标准差：

List<ValveData> list = ...
var result = list.Select( v => (double)v.SomeField )
                 .CalculateStdDev();

This article should help you. It creates a function that computes the deviation of a sequence of double values. All you have to do is supply a sequence of appropriate data elements.

The resulting function is:

private double CalculateStandardDeviation(IEnumerable<double> values)
{   
  double standardDeviation = 0;

  if (values.Any()) 
  {      
     // Compute the average.     
     double avg = values.Average();

     // Perform the Sum of (value-avg)_2_2.      
     double sum = values.Sum(d => Math.Pow(d - avg, 2));

     // Put it all together.      
     standardDeviation = Math.Sqrt((sum) / (values.Count()-1));   
  }  

  return standardDeviation;
}

This is easy enough to adapt for any generic type, so long as we provide a selector for the value being computed. LINQ is great for that, the Select funciton allows you to project from your generic list of custom types a sequence of numeric values for which to compute the standard deviation:

List<ValveData> list = ...
var result = list.Select( v => (double)v.SomeField )
                 .CalculateStdDev();

回复收藏 0 原文

雨落星ぅ辰 2024-09-14 07:36:12

尽管接受的答案在数学上看起来是正确的，但从编程的角度来看它是错误的 - 它枚举了相同的序列 4 次。如果底层对象是列表或数组，这可能没问题，但如果输入是过滤/聚合/等 linq 表达式，或者数据直接来自数据库或网络流，这将导致性能低得多。

我强烈建议不要重新发明轮子，而使用更好的开源数学库 Math.NET 之一。我们一直在我们公司使用该库，并且对其性能非常满意。

PM>安装包MathNet.Numerics

var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();

var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();

请参阅 http://numerics.mathdotnet.com/docs/DescriptiveStatistics.html< /a> 了解更多信息。

最后，对于那些想要获得尽可能快的结果并牺牲一些精度的人，请阅读“一次性”算法 https://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods

Even though the accepted answer seems mathematically correct, it is wrong from the programming perspective - it enumerates the same sequence 4 times. This might be ok if the underlying object is a list or an array, but if the input is a filtered/aggregated/etc linq expression, or if the data is coming directly from the database or network stream, this would cause much lower performance.

I would highly recommend not to reinvent the wheel and use one of the better open source math libraries Math.NET. We have been using that lib in our company and are very happy with the performance.

PM> Install-Package MathNet.Numerics

var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();

var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();

See http://numerics.mathdotnet.com/docs/DescriptiveStatistics.html for more information.

Lastly, for those who want to get the fastest possible result and sacrifice some precision, read "one-pass" algorithm https://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods

回复收藏 0 原文

我们的影子 2024-09-14 07:36:12

我看到你在做什么，我也使用类似的东西。在我看来，你走得还不够远。我倾向于将所有数据处理封装到一个类中，这样我就可以缓存计算的值，直到列表发生更改。
例如：

public class StatProcessor{
private list<double> _data; //this holds the current data
private _avg; //we cache average here
private _avgValid; //a flag to say weather we need to calculate the average or not
private _calcAvg(); //calculate the average of the list and cache in _avg, and set _avgValid
public double average{
     get{
     if(!_avgValid) //if we dont HAVE to calculate the average, skip it
        _calcAvg(); //if we do, go ahead, cache it, then set the flag.
     return _avg; //now _avg is garunteed to be good, so return it.
     }
}
...more stuff
Add(){
//add stuff to the list here, and reset the flag
}
}

您会注意到，使用此方法，只有第一个平均请求才实际计算平均值。之后，只要我们不从列表中添加（或删除或修改，但那些未显示的内容）任何内容，我们基本上就可以获得平均值。

此外，由于标准偏差算法中使用平均值，因此首先计算标准偏差将免费为我们提供平均值，并且首先计算平均值将为我们在标准偏差计算中带来一点性能提升，假设我们记得检查标志。

此外！像平均函数这样的地方，无论如何你已经循环遍历每个值，是缓存最小值和最大值之类的东西的好时机。当然，对此信息的请求需要首先检查它们是否已被缓存，与仅使用列表查找最大值相比，这可能会导致相对缓慢的速度，因为它完成了设置所有相关缓存的所有额外工作，而不仅仅是设置您访问的一个。

I see what you're doing, and I use something similar. It seems to me you're not going far enough. I tend to encapsulate all data processing into a single class, that way I can cache the values that are calculated until the list changes.
for instance:

public class StatProcessor{
private list<double> _data; //this holds the current data
private _avg; //we cache average here
private _avgValid; //a flag to say weather we need to calculate the average or not
private _calcAvg(); //calculate the average of the list and cache in _avg, and set _avgValid
public double average{
     get{
     if(!_avgValid) //if we dont HAVE to calculate the average, skip it
        _calcAvg(); //if we do, go ahead, cache it, then set the flag.
     return _avg; //now _avg is garunteed to be good, so return it.
     }
}
...more stuff
Add(){
//add stuff to the list here, and reset the flag
}
}

You'll notice that using this method, only the first request for average actually computes the average. After that, as long as we don't add (or remove, or modify at all, but those arnt shown) anything from the list, we can get the average for basically nothing.

Additionally, since the average is used in the algorithm for the standard deviation, computing the standard deviation first will give us the average for free, and computing the average first will give us a little performance boost in the standard devation calculation, assuming we remember to check the flag.

Furthermore! places like the average function, where you're looping through every value already anyway, is a great time to cache things like the minimum and maximum values. Of course, requests for this information need to first check whether theyve been cached, and that can cause a relative slowdown compared to just finding the max using the list, since it does all the extra work setting up all the concerned caches, not just the one your accessing.

回复收藏 0 原文

~没有更多了~