LINQ 中的标准差

发布于 2024-08-21 01:51:53 字数 216 浏览 11 评论 0原文

LINQ 是否对聚合 SQL 函数 STDDEV()(标准差)进行建模?

如果不是,最简单/最佳实践的计算方法是什么?

例子:

  SELECT test_id, AVERAGE(result) avg, STDDEV(result) std 
    FROM tests
GROUP BY test_id

Does LINQ model the aggregate SQL function STDDEV() (standard deviation)?

If not, what is the simplest / best-practices way to calculate it?

Example:

  SELECT test_id, AVERAGE(result) avg, STDDEV(result) std 
    FROM tests
GROUP BY test_id

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

德意的啸 2024-08-28 01:51:54

您可以制作自己的扩展来计算它

public static class Extensions
{
    public static double StdDev(this IEnumerable<double> values)
    {
       double ret = 0;
       int count = values.Count();
       if (count  > 1)
       {
          //Compute the Average
          double avg = values.Average();

          //Perform the Sum of (value-avg)^2
          double sum = values.Sum(d => (d - avg) * (d - avg));

          //Put it all together
          ret = Math.Sqrt(sum / count);
       }
       return ret;
    }
}

如果您有总体样本而不是整个总体,那么您应该使用 ret = Math.Sqrt(sum / (count - 1)) ;。

向 LINQ 添加标准差转换为扩展作者:克里斯·贝内特

You can make your own extension calculating it

public static class Extensions
{
    public static double StdDev(this IEnumerable<double> values)
    {
       double ret = 0;
       int count = values.Count();
       if (count  > 1)
       {
          //Compute the Average
          double avg = values.Average();

          //Perform the Sum of (value-avg)^2
          double sum = values.Sum(d => (d - avg) * (d - avg));

          //Put it all together
          ret = Math.Sqrt(sum / count);
       }
       return ret;
    }
}

If you have a sample of the population rather than the whole population, then you should use ret = Math.Sqrt(sum / (count - 1));.

Transformed into extension from Adding Standard Deviation to LINQ by Chris Bennett.

只是一片海 2024-08-28 01:51:54

Dynami 的答案有效,但需要多次传递数据才能得到结果。这是计算样本标准差的单遍方法:

public static double StdDev(this IEnumerable<double> values)
{
    double mean = 0.0;
    double sum = 0.0;
    double stdDev = 0.0;
    int n = 0;
    foreach (double val in values)
    {
        n++;
        double delta = val - mean;
        mean += delta / n;
        sum += delta * (val - mean);
    }
    if (1 < n)
        stdDev = Math.Sqrt(sum / (n - 1));

    return stdDev;
}

这是样本标准差,因为它除以n - 1。对于正常标准差,您需要除以 n

这使用韦尔福德方法Average(x^2)-Average(x)^2 方法相比,a> 具有更高的数值精度。

Dynami's answer works but makes multiple passes through the data to get a result. This is a single pass method that calculates the sample standard deviation:

public static double StdDev(this IEnumerable<double> values)
{
    double mean = 0.0;
    double sum = 0.0;
    double stdDev = 0.0;
    int n = 0;
    foreach (double val in values)
    {
        n++;
        double delta = val - mean;
        mean += delta / n;
        sum += delta * (val - mean);
    }
    if (1 < n)
        stdDev = Math.Sqrt(sum / (n - 1));

    return stdDev;
}

This is the sample standard deviation since it divides by n - 1. For the normal standard deviation you need to divide by n instead.

This uses Welford's method which has higher numerical accuracy compared to the Average(x^2)-Average(x)^2 method.

森罗 2024-08-28 01:51:54

这会将 David Clarke 的答案转换为一个扩展,该扩展遵循与其他聚合 LINQ 函数(如 Average)相同的形式。

用法是: var stdev = data.StdDev(o => o.number)

public static class Extensions
{
    public static double StdDev<T>(this IEnumerable<T> list, Func<T, double> values)
    {
        // ref: https://stackoverflow.com/questions/2253874/linq-equivalent-for-standard-deviation
        // ref: http://warrenseen.com/blog/2006/03/13/how-to-calculate-standard-deviation/ 
        var mean = 0.0;
        var sum = 0.0;
        var stdDev = 0.0;
        var n = 0;
        foreach (var value in list.Select(values))
        {
            n++;
            var delta = value - mean;
            mean += delta / n;
            sum += delta * (value - mean);
        }
        if (1 < n)
            stdDev = Math.Sqrt(sum / (n - 1));

        return stdDev; 

    }
} 

This converts David Clarke's answer into an extension that follows the same form as the other aggregate LINQ functions like Average.

Usage would be: var stdev = data.StdDev(o => o.number)

public static class Extensions
{
    public static double StdDev<T>(this IEnumerable<T> list, Func<T, double> values)
    {
        // ref: https://stackoverflow.com/questions/2253874/linq-equivalent-for-standard-deviation
        // ref: http://warrenseen.com/blog/2006/03/13/how-to-calculate-standard-deviation/ 
        var mean = 0.0;
        var sum = 0.0;
        var stdDev = 0.0;
        var n = 0;
        foreach (var value in list.Select(values))
        {
            n++;
            var delta = value - mean;
            mean += delta / n;
            sum += delta * (value - mean);
        }
        if (1 < n)
            stdDev = Math.Sqrt(sum / (n - 1));

        return stdDev; 

    }
} 
人心善变 2024-08-28 01:51:54
var stddev = Math.Sqrt(data.Average(z=>z*z)-Math.Pow(data.Average(),2));
var stddev = Math.Sqrt(data.Average(z=>z*z)-Math.Pow(data.Average(),2));
ㄖ落Θ余辉 2024-08-28 01:51:54

开门见山(C# > 6.0),Dynamis 的答案变成了这样:

    public static double StdDev(this IEnumerable<double> values)
    {
        var count = values?.Count() ?? 0;
        if (count <= 1) return 0;

        var avg = values.Average();
        var sum = values.Sum(d => Math.Pow(d - avg, 2));

        return Math.Sqrt(sum / count);
    }

编辑 2020-08-27:

我接受了 @David Clarke 的评论来进行一些性能测试
这是结果:

    public static (double stdDev, double avg) StdDevFast(this List<double> values)
    {
        var count = values?.Count ?? 0;
        if (count <= 1) return (0, 0);

        var avg = GetAverage(values);
        var sum = GetSumOfSquareDiff(values, avg);

        return (Math.Sqrt(sum / count), avg);
    }

    private static double GetAverage(List<double> values)
    {
        double sum = 0.0;
        for (int i = 0; i < values.Count; i++) 
            sum += values[i];
        
        return sum / values.Count;
    }
    private static double GetSumOfSquareDiff(List<double> values, double avg)
    {
        double sum = 0.0;
        for (int i = 0; i < values.Count; i++)
        {
            var diff = values[i] - avg;
            sum += diff * diff;
        }
        return sum;
    }

我用一百万个随机双精度列表对此进行了测试
原始实现的运行时间约为 48 毫秒
性能优化实现2-3ms
所以这是一个重大改进。

一些有趣的细节:
摆脱 Math.Pow 会带来 33 毫秒的提升!
列表而不是 IEnumerable 6ms
手动平均计算4ms
For 循环而不是 ForEach 循环 2ms
数组而不是列表只带来了约 2% 的改进,所以我跳过了这个
使用 single 而不是 double 不会带来任何结果

进一步降低代码并使用 goto (是的 GOTO...自 90 年代汇编程序以来就没有使用过这个...)而不是 for 循环
不付费,谢天谢地!

我也测试过并行计算,这在列表上是有意义的> 200.000 件商品
似乎硬件和软件需要初始化很多,这对于小列表来说会适得其反。

所有测试连续执行两次以消除预热时间。

Straight to the point (and C# > 6.0), Dynamis answer becomes this:

    public static double StdDev(this IEnumerable<double> values)
    {
        var count = values?.Count() ?? 0;
        if (count <= 1) return 0;

        var avg = values.Average();
        var sum = values.Sum(d => Math.Pow(d - avg, 2));

        return Math.Sqrt(sum / count);
    }

Edit 2020-08-27:

I took @David Clarke comments to make some performance tests
and this are the results:

    public static (double stdDev, double avg) StdDevFast(this List<double> values)
    {
        var count = values?.Count ?? 0;
        if (count <= 1) return (0, 0);

        var avg = GetAverage(values);
        var sum = GetSumOfSquareDiff(values, avg);

        return (Math.Sqrt(sum / count), avg);
    }

    private static double GetAverage(List<double> values)
    {
        double sum = 0.0;
        for (int i = 0; i < values.Count; i++) 
            sum += values[i];
        
        return sum / values.Count;
    }
    private static double GetSumOfSquareDiff(List<double> values, double avg)
    {
        double sum = 0.0;
        for (int i = 0; i < values.Count; i++)
        {
            var diff = values[i] - avg;
            sum += diff * diff;
        }
        return sum;
    }

I tested this with a list of one million random doubles
the original implementation had an runtime of ~48ms
the performance optimized implementation 2-3ms
so this is an significant improvement.

Some interesting details:
getting rid of Math.Pow brings a boost of 33ms!
List instead of IEnumerable 6ms
manually Average calculation 4ms
For-loops instead of ForEach-loops 2ms
Array instead of List brings just an improvement of ~2% so i skipped this
using single instead of double brings nothing

Further lowering the code and using goto (yes GOTO... haven't used this since the 90s assembler...) instead of for-loops
does not pay, Thank goodness!

I have tested also parallel calculation, this makes sense on list > 200.000 items
It seems that Hardware and Software needs to initialize a lot and this is for small lists contra-productive.

All tests were executed two times in a row to get rid of the warmup-time.

傲影 2024-08-28 01:51:54

简单的 4 行,我使用了双精度列表,但可以使用 IEnumerable;值

public static double GetStandardDeviation(List<double> values)
{
    double avg = values.Average();
    double sum = values.Sum(v => (v - avg) * (v - avg));
    double denominator = values.Count - 1;
    return denominator > 0.0 ? Math.Sqrt(sum / denominator) : -1;
}

Simple 4 lines, I used a List of doubles but one could use IEnumerable<int> values

public static double GetStandardDeviation(List<double> values)
{
    double avg = values.Average();
    double sum = values.Sum(v => (v - avg) * (v - avg));
    double denominator = values.Count - 1;
    return denominator > 0.0 ? Math.Sqrt(sum / denominator) : -1;
}
说谎友 2024-08-28 01:51:54

一般情况下,我们希望在一次中计算StdDev:如果values文件怎么办em> 或 RDBMS 光标
计算平均值和总和之间可以更改哪个?我们将得到不一致的结果。这
下面的代码仅使用一次传递:

// Population StdDev
public static double StdDev(this IEnumerable<double> values) {
  if (null == values)
    throw new ArgumentNullException(nameof(values));

  double N = 0;
  double Sx = 0.0;
  double Sxx = 0.0;

  foreach (double x in values) {
    N += 1;
    Sx += x;
    Sxx += x * x;
  }

  return N == 0
    ? double.NaN // or throw exception
    : Math.Sqrt((Sxx - Sx * Sx / N) / N);
}

sample StdDev 的想法完全相同:

// Sample StdDev
public static double StdDev(this IEnumerable<double> values) {
  if (null == values)
    throw new ArgumentNullException(nameof(values));

  double N = 0;
  double Sx = 0.0;
  double Sxx = 0.0;

  foreach (double x in values) {
    N += 1;
    Sx += x;
    Sxx += x * x;
  }

  return N <= 1
    ? double.NaN // or throw exception
    : Math.Sqrt((Sxx - Sx * Sx / N) / (N - 1));
}

In general case we want to compute StdDev in one pass: what if values is file or RDBMS cursor
which can be changed between computing average and sum? We are going to have inconsistent result. The
code below uses just one pass:

// Population StdDev
public static double StdDev(this IEnumerable<double> values) {
  if (null == values)
    throw new ArgumentNullException(nameof(values));

  double N = 0;
  double Sx = 0.0;
  double Sxx = 0.0;

  foreach (double x in values) {
    N += 1;
    Sx += x;
    Sxx += x * x;
  }

  return N == 0
    ? double.NaN // or throw exception
    : Math.Sqrt((Sxx - Sx * Sx / N) / N);
}

The very same idea for sample StdDev:

// Sample StdDev
public static double StdDev(this IEnumerable<double> values) {
  if (null == values)
    throw new ArgumentNullException(nameof(values));

  double N = 0;
  double Sx = 0.0;
  double Sxx = 0.0;

  foreach (double x in values) {
    N += 1;
    Sx += x;
    Sxx += x * x;
  }

  return N <= 1
    ? double.NaN // or throw exception
    : Math.Sqrt((Sxx - Sx * Sx / N) / (N - 1));
}
梦纸 2024-08-28 01:51:54
public static double StdDev(this IEnumerable<int> values, bool as_sample = false)
{
    var count = values.Count();
    if (count > 0) // check for divide by zero
    // Get the mean.
    double mean = values.Sum() / count;

    // Get the sum of the squares of the differences
    // between the values and the mean.
    var squares_query =
        from int value in values
        select (value - mean) * (value - mean);
    double sum_of_squares = squares_query.Sum();
    return Math.Sqrt(sum_of_squares / (count - (as_sample ? 1 : 0)))
}
public static double StdDev(this IEnumerable<int> values, bool as_sample = false)
{
    var count = values.Count();
    if (count > 0) // check for divide by zero
    // Get the mean.
    double mean = values.Sum() / count;

    // Get the sum of the squares of the differences
    // between the values and the mean.
    var squares_query =
        from int value in values
        select (value - mean) * (value - mean);
    double sum_of_squares = squares_query.Sum();
    return Math.Sqrt(sum_of_squares / (count - (as_sample ? 1 : 0)))
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文