当前位置：文江博客话题详情

如何确定一组值的标准差 (stddev)？

发布于 2024-07-21 13:24:24 字数 45 浏览 7 评论 0原文

我需要知道一个数字与一组数字相比是否超出平均值 1 个 stddev，等等。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

猫烠⑼条掵仅有一顆心 2024-07-28 13:24:24

虽然平方和算法在大多数情况下工作正常，但如果您处理非常大的数字，它可能会造成很大的麻烦。你基本上可能会得到负方差...

另外，不要永远、永远、永远将 a^2 计算为 pow(a,2)，a * a 几乎肯定会更快。

到目前为止，计算标准差的最佳方法是韦尔福德方法。我的 C 非常生疏，但它可能看起来像这样：

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    int k = 1;
    foreach (double value in valueList) 
    {
        double tmpM = M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
        k++;
    }
    return Math.Sqrt(S / (k-2));
}

如果您拥有整个总体（而不是样本总体），则使用 return Math. Sqrt(S / (k-1));.

编辑：我已经根据 Jason 的评论更新了代码...

编辑：我还根据 Alex 的评论更新了代码...

While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...

Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.

By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    int k = 1;
    foreach (double value in valueList) 
    {
        double tmpM = M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
        k++;
    }
    return Math.Sqrt(S / (k-2));
}

If you have the whole population (as opposed to a sample population), then use return Math.Sqrt(S / (k-1));.

EDIT: I've updated the code according to Jason's remarks...

EDIT: I've also updated the code according to Alex's remarks...

回复收藏 0 原文

离去的眼神 2024-07-28 13:24:24

比 Jaime 的解决方案快 10 倍，但请注意，
正如海梅指出的：

“虽然平方和算法在大多数情况下都可以正常工作，但它
如果您处理非常大的数字，可能会造成大麻烦。你
基本上可能会出现负方差”

如果你认为你正在处理非常大的数字或非常大量的数字，你应该使用两种方法进行计算，如果结果相等，你肯定知道你可以使用“我的“适合您案例的方法。

    public static double StandardDeviation(double[] data)
    {
        double stdDev = 0;
        double sumAll = 0;
        double sumAllQ = 0;

        //Sum of x and sum of x²
        for (int i = 0; i < data.Length; i++)
        {
            double x = data[i];
            sumAll += x;
            sumAllQ += x * x;
        }

        //Mean (not used here)
        //double mean = 0;
        //mean = sumAll / (double)data.Length;

        //Standard deviation
        stdDev = System.Math.Sqrt(
            (sumAllQ -
            (sumAll * sumAll) / data.Length) *
            (1.0d / (data.Length - 1))
            );

        return stdDev;
    }

10 times faster solution than Jaime's, but be aware that,
as Jaime pointed out:

"While the sum of squares algorithm works fine most of the time, it
can cause big trouble if you are dealing with very large numbers. You
basically may end up with a negative variance"

If you think you are dealing with very large numbers or a very large quantity of numbers, you should calculate using both methods, if the results are equal, you know for sure that you can use "my" method for your case.

    public static double StandardDeviation(double[] data)
    {
        double stdDev = 0;
        double sumAll = 0;
        double sumAllQ = 0;

        //Sum of x and sum of x²
        for (int i = 0; i < data.Length; i++)
        {
            double x = data[i];
            sumAll += x;
            sumAllQ += x * x;
        }

        //Mean (not used here)
        //double mean = 0;
        //mean = sumAll / (double)data.Length;

        //Standard deviation
        stdDev = System.Math.Sqrt(
            (sumAllQ -
            (sumAll * sumAll) / data.Length) *
            (1.0d / (data.Length - 1))
            );

        return stdDev;
    }

回复收藏 0 原文

小镇女孩 2024-07-28 13:24:24

Jaime 接受的答案很好，除了你需要在最后一行除以 k-2 （你需要除以“number_of_elements-1”）。
更好的是，k 从 0 开始：

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    int k = 0;
    foreach (double value in valueList) 
    {
        k++;
        double tmpM = M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
    }
    return Math.Sqrt(S / (k-1));
}

The accepted answer by Jaime is great, except you need to divide by k-2 in the last line (you need to divide by "number_of_elements-1").
Better yet, start k at 0:

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    int k = 0;
    foreach (double value in valueList) 
    {
        k++;
        double tmpM = M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
    }
    return Math.Sqrt(S / (k-1));
}

回复收藏 0 原文

蓝眼泪 2024-07-28 13:24:24

Math.NET 库为您提供了开箱即用的功能。

PM> 安装包MathNet.Numerics

var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();

var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();

请参阅 PopulationStandardDeviation 了解更多信息信息。

The Math.NET library provides this for you to of the box.

PM> Install-Package MathNet.Numerics

var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();

var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();

See PopulationStandardDeviation for more information.

回复收藏 0 原文

甜心 2024-07-28 13:24:24

代码片段：

public static double StandardDeviation(List<double> valueList)
{
    if (valueList.Count < 2) return 0.0;
    double sumOfSquares = 0.0;
    double average = valueList.Average(); //.NET 3.0
    foreach (double value in valueList) 
    {
        sumOfSquares += Math.Pow((value - average), 2);
    }
    return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}

Code snippet:

public static double StandardDeviation(List<double> valueList)
{
    if (valueList.Count < 2) return 0.0;
    double sumOfSquares = 0.0;
    double average = valueList.Average(); //.NET 3.0
    foreach (double value in valueList) 
    {
        sumOfSquares += Math.Pow((value - average), 2);
    }
    return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}

回复收藏 0 原文

完美的未来在梦里 2024-07-28 13:24:24

您可以通过累加均值和均方

cnt = 0
mean = 0
meansqr = 0
loop over array
    cnt++
    mean += value
    meansqr += value*value
mean /= cnt
meansqr /= cnt

并形成

sigma = sqrt(meansqr - mean^2)

因子 cnt/(cnt-1) 来避免对数据进行两次传递，这通常也是合适的。

顺便说一句 - 第一次传递黛米和McWafflestix 答案隐藏在对 Average 的调用中。对于一个小列表来说，这种事情当然是微不足道的，但是如果列表超过了缓存的大小，甚至超过了工作集，这就会成为一个投标交易。

You can avoid making two passes over the data by accumulating the mean and mean-square

cnt = 0
mean = 0
meansqr = 0
loop over array
    cnt++
    mean += value
    meansqr += value*value
mean /= cnt
meansqr /= cnt

and forming

sigma = sqrt(meansqr - mean^2)

A factor of cnt/(cnt-1) is often appropriate as well.

BTW-- The first pass over the data in Demi and McWafflestix answers are hidden in the calls to Average. That kind of thing is certainly trivial on a small list, but if the list exceed the size of the cache, or even the working set, this gets to be a bid deal.

回复收藏 0 原文

江南月 2024-07-28 13:24:24

我发现 Rob 的有用答案与我使用 Excel 看到的内容不太相符。为了匹配 Excel，我将 valueList 的平均值传递到标准偏差计算中。

这是我的两分钱......显然你可以从函数内的 valueList 计算移动平均值（ma） - 但我碰巧在需要 standardDeviation 之前就已经计算了。

public double StandardDeviation(List<double> valueList, double ma)
{
   double xMinusMovAvg = 0.0;
   double Sigma = 0.0;
   int k = valueList.Count;


  foreach (double value in valueList){
     xMinusMovAvg = value - ma;
     Sigma = Sigma + (xMinusMovAvg * xMinusMovAvg);
  }
  return Math.Sqrt(Sigma / (k - 1));
}

I found that Rob's helpful answer didn't quite match what I was seeing using excel. To match excel, I passed the Average for valueList in to the StandardDeviation calculation.

Here is my two cents... and clearly you could calculate the moving average (ma) from valueList inside the function - but I happen to have already before needing the standardDeviation.

public double StandardDeviation(List<double> valueList, double ma)
{
   double xMinusMovAvg = 0.0;
   double Sigma = 0.0;
   int k = valueList.Count;


  foreach (double value in valueList){
     xMinusMovAvg = value - ma;
     Sigma = Sigma + (xMinusMovAvg * xMinusMovAvg);
  }
  return Math.Sqrt(Sigma / (k - 1));
}

回复收藏 0 原文

始终不够 2024-07-28 13:24:24

使用扩展方法。

using System;
using System.Collections.Generic;

namespace SampleApp
{
    internal class Program
    {
        private static void Main()
        {
            List<double> data = new List<double> {1, 2, 3, 4, 5, 6};

            double mean = data.Mean();
            double variance = data.Variance();
            double sd = data.StandardDeviation();

            Console.WriteLine("Mean: {0}, Variance: {1}, SD: {2}", mean, variance, sd);
            Console.WriteLine("Press any key to continue...");
            Console.ReadKey();
        }
    }

    public static class MyListExtensions
    {
        public static double Mean(this List<double> values)
        {
            return values.Count == 0 ? 0 : values.Mean(0, values.Count);
        }

        public static double Mean(this List<double> values, int start, int end)
        {
            double s = 0;

            for (int i = start; i < end; i++)
            {
                s += values[i];
            }

            return s / (end - start);
        }

        public static double Variance(this List<double> values)
        {
            return values.Variance(values.Mean(), 0, values.Count);
        }

        public static double Variance(this List<double> values, double mean)
        {
            return values.Variance(mean, 0, values.Count);
        }

        public static double Variance(this List<double> values, double mean, int start, int end)
        {
            double variance = 0;

            for (int i = start; i < end; i++)
            {
                variance += Math.Pow((values[i] - mean), 2);
            }

            int n = end - start;
            if (start > 0) n -= 1;

            return variance / (n);
        }

        public static double StandardDeviation(this List<double> values)
        {
            return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count);
        }

        public static double StandardDeviation(this List<double> values, int start, int end)
        {
            double mean = values.Mean(start, end);
            double variance = values.Variance(mean, start, end);

            return Math.Sqrt(variance);
        }
    }
}

With Extension methods.

using System;
using System.Collections.Generic;

namespace SampleApp
{
    internal class Program
    {
        private static void Main()
        {
            List<double> data = new List<double> {1, 2, 3, 4, 5, 6};

            double mean = data.Mean();
            double variance = data.Variance();
            double sd = data.StandardDeviation();

            Console.WriteLine("Mean: {0}, Variance: {1}, SD: {2}", mean, variance, sd);
            Console.WriteLine("Press any key to continue...");
            Console.ReadKey();
        }
    }

    public static class MyListExtensions
    {
        public static double Mean(this List<double> values)
        {
            return values.Count == 0 ? 0 : values.Mean(0, values.Count);
        }

        public static double Mean(this List<double> values, int start, int end)
        {
            double s = 0;

            for (int i = start; i < end; i++)
            {
                s += values[i];
            }

            return s / (end - start);
        }

        public static double Variance(this List<double> values)
        {
            return values.Variance(values.Mean(), 0, values.Count);
        }

        public static double Variance(this List<double> values, double mean)
        {
            return values.Variance(mean, 0, values.Count);
        }

        public static double Variance(this List<double> values, double mean, int start, int end)
        {
            double variance = 0;

            for (int i = start; i < end; i++)
            {
                variance += Math.Pow((values[i] - mean), 2);
            }

            int n = end - start;
            if (start > 0) n -= 1;

            return variance / (n);
        }

        public static double StandardDeviation(this List<double> values)
        {
            return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count);
        }

        public static double StandardDeviation(this List<double> values, int start, int end)
        {
            double mean = values.Mean(start, end);
            double variance = values.Variance(mean, start, end);

            return Math.Sqrt(variance);
        }
    }
}

回复收藏 0 原文

夏夜暖风 2024-07-28 13:24:24

/// <summary>
/// Calculates standard deviation, same as MATLAB std(X,0) function
/// <seealso cref="http://www.mathworks.co.uk/help/techdoc/ref/std.html"/>
/// </summary>
/// <param name="values">enumumerable data</param>
/// <returns>Standard deviation</returns>
public static double GetStandardDeviation(this IEnumerable<double> values)
{
    //validation
    if (values == null)
        throw new ArgumentNullException();

    int lenght = values.Count();

    //saves from devision by 0
    if (lenght == 0 || lenght == 1)
        return 0;

    double sum = 0.0, sum2 = 0.0;

    for (int i = 0; i < lenght; i++)
    {
        double item = values.ElementAt(i);
        sum += item;
        sum2 += item * item;
    }

    return Math.Sqrt((sum2 - sum * sum / lenght) / (lenght - 1));
}

/// <summary>
/// Calculates standard deviation, same as MATLAB std(X,0) function
/// <seealso cref="http://www.mathworks.co.uk/help/techdoc/ref/std.html"/>
/// </summary>
/// <param name="values">enumumerable data</param>
/// <returns>Standard deviation</returns>
public static double GetStandardDeviation(this IEnumerable<double> values)
{
    //validation
    if (values == null)
        throw new ArgumentNullException();

    int lenght = values.Count();

    //saves from devision by 0
    if (lenght == 0 || lenght == 1)
        return 0;

    double sum = 0.0, sum2 = 0.0;

    for (int i = 0; i < lenght; i++)
    {
        double item = values.ElementAt(i);
        sum += item;
        sum2 += item * item;
    }

    return Math.Sqrt((sum2 - sum * sum / lenght) / (lenght - 1));
}

回复收藏 0 原文

硪扪都還晓 2024-07-28 13:24:24

所有其他答案的问题在于他们假设你有你的
数据在一个大数组中。如果您的数据是动态传入的，这将是
更好的方法。无论您如何或是否存储数据，该类都会起作用。它还可以让您选择华尔道夫法或平方和法。两种方法均使用单遍工作。

public final class StatMeasure {
  private StatMeasure() {}

  public interface Stats1D {

    /** Add a value to the population */
    void addValue(double value);

    /** Get the mean of all the added values */
    double getMean();

    /** Get the standard deviation from a sample of the population. */
    double getStDevSample();

    /** Gets the standard deviation for the entire population. */
    double getStDevPopulation();
  }

  private static class WaldorfPopulation implements Stats1D {
    private double mean = 0.0;
    private double sSum = 0.0;
    private int count = 0;

    @Override
    public void addValue(double value) {
      double tmpMean = mean;
      double delta = value - tmpMean;
      mean += delta / ++count;
      sSum += delta * (value - mean);
    }

    @Override
    public double getMean() { return mean; }

    @Override
    public double getStDevSample() { return Math.sqrt(sSum / (count - 1)); }

    @Override
    public double getStDevPopulation() { return Math.sqrt(sSum / (count)); }
  }

  private static class StandardPopulation implements Stats1D {
    private double sum = 0.0;
    private double sumOfSquares = 0.0;
    private int count = 0;

    @Override
    public void addValue(double value) {
      sum += value;
      sumOfSquares += value * value;
      count++;
    }

    @Override
    public double getMean() { return sum / count; }

    @Override
    public double getStDevSample() {
      return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / (count - 1));
    }

    @Override
    public double getStDevPopulation() {
      return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / count);
    }
  }

  /**
   * Returns a way to measure a population of data using Waldorf's method.
   * This method is better if your population or values are so large that
   * the sum of x-squared may overflow. It's also probably faster if you
   * need to recalculate the mean and standard deviation continuously,
   * for example, if you are continually updating a graphic of the data as
   * it flows in.
   *
   * @return A Stats1D object that uses Waldorf's method.
   */
  public static Stats1D getWaldorfStats() { return new WaldorfPopulation(); }

  /**
   * Return a way to measure the population of data using the sum-of-squares
   * method. This is probably faster than Waldorf's method, but runs the
   * risk of data overflow.
   *
   * @return A Stats1D object that uses the sum-of-squares method
   */
  public static Stats1D getSumOfSquaresStats() { return new StandardPopulation(); }
}

The trouble with all the other answers is that they assume you have your
data in a big array. If your data is coming in on the fly, this would be
a better approach. This class works regardless of how or if you store your data. It also gives you the choice of the Waldorf method or the sum-of-squares method. Both methods work using a single pass.

public final class StatMeasure {
  private StatMeasure() {}

  public interface Stats1D {

    /** Add a value to the population */
    void addValue(double value);

    /** Get the mean of all the added values */
    double getMean();

    /** Get the standard deviation from a sample of the population. */
    double getStDevSample();

    /** Gets the standard deviation for the entire population. */
    double getStDevPopulation();
  }

  private static class WaldorfPopulation implements Stats1D {
    private double mean = 0.0;
    private double sSum = 0.0;
    private int count = 0;

    @Override
    public void addValue(double value) {
      double tmpMean = mean;
      double delta = value - tmpMean;
      mean += delta / ++count;
      sSum += delta * (value - mean);
    }

    @Override
    public double getMean() { return mean; }

    @Override
    public double getStDevSample() { return Math.sqrt(sSum / (count - 1)); }

    @Override
    public double getStDevPopulation() { return Math.sqrt(sSum / (count)); }
  }

  private static class StandardPopulation implements Stats1D {
    private double sum = 0.0;
    private double sumOfSquares = 0.0;
    private int count = 0;

    @Override
    public void addValue(double value) {
      sum += value;
      sumOfSquares += value * value;
      count++;
    }

    @Override
    public double getMean() { return sum / count; }

    @Override
    public double getStDevSample() {
      return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / (count - 1));
    }

    @Override
    public double getStDevPopulation() {
      return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / count);
    }
  }

  /**
   * Returns a way to measure a population of data using Waldorf's method.
   * This method is better if your population or values are so large that
   * the sum of x-squared may overflow. It's also probably faster if you
   * need to recalculate the mean and standard deviation continuously,
   * for example, if you are continually updating a graphic of the data as
   * it flows in.
   *
   * @return A Stats1D object that uses Waldorf's method.
   */
  public static Stats1D getWaldorfStats() { return new WaldorfPopulation(); }

  /**
   * Return a way to measure the population of data using the sum-of-squares
   * method. This is probably faster than Waldorf's method, but runs the
   * risk of data overflow.
   *
   * @return A Stats1D object that uses the sum-of-squares method
   */
  public static Stats1D getSumOfSquaresStats() { return new StandardPopulation(); }
}

回复收藏 0 原文

梦开始←不甜 2024-07-28 13:24:24

我们也许可以使用Python中的统计模块。它有 stedev() 和 pstdev() 命令分别计算样本和总体的标准差。

详细信息请参见：https://www.geeksforgeeks.org/python-statistics-stdev/

将统计数据导入为 st
print(st.ptdev(dataframe['列名']))

回复收藏 0 原文

伴梦长久 2024-07-28 13:24:24

这是总体标准差

private double calculateStdDev(List<double> values)
{
    double average = values.Average();
    return Math.Sqrt((values.Select(val => (val - average) * (val - average)).Sum()) / values.Count);
}

对于样本标准差，只需将上面代码中的 [values.Count] 更改为 [values.Count -1] 即可。

确保您的数据集中不只有 1 个数据点。

This is Population standard deviation

private double calculateStdDev(List<double> values)
{
    double average = values.Average();
    return Math.Sqrt((values.Select(val => (val - average) * (val - average)).Sum()) / values.Count);
}

For Sample standard deviation, just change [values.Count] to [values.Count -1] in above code.

Make sure you don't have only 1 data point in your set.

回复收藏 0 原文

月亮是我掰弯的 2024-07-28 13:24:24

现有的答案都没有考虑到您可能需要包含增量自由度，例如，从 NumPy 或类似的 Python / R 库移植代码时。在这种情况下：

public static float StandardDeviation(double[] data, int deltaDegreesOfFreedom = 1)
{
    double avg = data.Average();
    double accu = data.Select(x => x - avg).Select(z => z * z / (data.Length - deltaDegreesOfFreedom)).Sum();
    return Math.Sqrt(accu);
}

None of the existing answers considers you might need to include delta degrees of freedom, for example, when porting code from NumPy or similar Python / R libraries. In that case:

public static float StandardDeviation(double[] data, int deltaDegreesOfFreedom = 1)
{
    double avg = data.Average();
    double accu = data.Select(x => x - avg).Select(z => z * z / (data.Length - deltaDegreesOfFreedom)).Sum();
    return Math.Sqrt(accu);
}

回复收藏 0 原文

~没有更多了~