Enumerable.Average 和 OverflowException

发布于 2024-11-02 11:43:39 字数 403 浏览 0 评论 0原文

也许是一个无用的问题：

public static double Average<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, int> selector
)

上述方法抛出的异常之一也是 OverflowException：序列中元素的总和大于 Int64.MaxValue。

我认为此异常的原因是平均值是使用long类型的变量S计算的吗？但既然返回值是double类型，为什么设计者不选择让S也是double类型呢？

谢谢

原文

Perhaps a useless question:

public static double Average<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, int> selector
)

One of exceptions thrown by the above method is also OverflowException: The sum of the elements in the sequence is larger than Int64.MaxValue.

I assume reason for this exception is that sum of the averaged values is computed using variable S of type long? But since return value is of type double, why didn't designers choose to make S also of type double?

Thank you

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゃ人海孤独症 2024-11-09 11:43:39

因为这个特定的重载知道您从 int 值开始，所以它知道您没有使用十进制值。将每个值转换为 double，然后将 double 值相加可能效率较低，并且肯定会导致浮点不精确问题的可能性，如果你有足够多的值集合。

更新

我刚刚做了一个快速基准测试，平均两倍的时间比平均~~长了大约50%~~两倍多。代码>int。

回复收藏 0 原文

梦里寻她 2024-11-09 11:43:39

首先，我注意到，除非超出了 long 的范围，否则不会出现异常。你打算怎么做？每个 int 最多可以约为 20 亿，而 long 的顶部约为 80 亿，因此这意味着您必须至少取超过 40 亿个 int 的平均值才能触发异常。这是您经常需要解决的问题吗？

为了论证的目的，假设是这样。以双精度数进行数学运算会损失精度，因为双精度运算会四舍五入到大约十五位小数。观看：

using System;
using System.Collections.Generic;
static class Extensions
{
    public static double DoubleAverage(this IEnumerable<int> sequence)
    {
        double sum = 0.0;
        long count = 0;
        foreach(int item in sequence) 
        {
            ++count;
            sum += item;
        }
        return sum / count;
    }
    public static IEnumerable<T> Concat<T>(this IEnumerable<T> seq1, IEnumerable<T> seq2)
    {
        foreach(T item in seq1) yield return item;
        foreach(T item in seq2) yield return item;
    }
}


class P
{
    public static IEnumerable<int> Repeat(int x, long count)
    {
        for (long i = 0; i < count; ++i) yield return x;
    }

    public static void Main()
    {
        System.Console.WriteLine(Repeat(1000000000, 10000000).Concat(Repeat(1, 90000000)).DoubleAverage()); 
        System.Console.WriteLine(Repeat(1, 90000000).Concat(Repeat(1000000000, 10000000)).DoubleAverage()); 
    }
}

在这里，我们用双重算术对两个系列进行平均：一个是 {十亿，十亿，十亿...一千万次...十亿，一，一一...九千万次}，另一个是相同的顺序，首先是个数，最后是数十亿个。如果运行代码，您会得到不同的结果。不是很大的不同，而是不同，而且序列越长，差异就会变得越来越大。长算术是精确的；双算术可能会对每个计算进行四舍五入，这意味着随着时间的推移，可能会产生巨大错误。

仅对整数进行运算会导致浮点舍入误差累积，这似乎非常出乎意料。这是人们在对浮点数执行操作时所期望的事情，但在对整数执行操作时则不然。

First off, I note that the exception does not arise until you have exceeded the bounds of a long. How are you going to do that? Each int can be at most about two billion, and the top of a long is about eight billion billion, so that means that you'd have to be taking the average of more than four billion ints minimum in order to trigger the exception. Is that the sort of problem you regularly have to solve?

Suppose for the sake of argument it is. Doing the math in doubles loses precision because double arithmetic is rounded off to about fifteen decimal places. Watch:

using System;
using System.Collections.Generic;
static class Extensions
{
    public static double DoubleAverage(this IEnumerable<int> sequence)
    {
        double sum = 0.0;
        long count = 0;
        foreach(int item in sequence) 
        {
            ++count;
            sum += item;
        }
        return sum / count;
    }
    public static IEnumerable<T> Concat<T>(this IEnumerable<T> seq1, IEnumerable<T> seq2)
    {
        foreach(T item in seq1) yield return item;
        foreach(T item in seq2) yield return item;
    }
}


class P
{
    public static IEnumerable<int> Repeat(int x, long count)
    {
        for (long i = 0; i < count; ++i) yield return x;
    }

    public static void Main()
    {
        System.Console.WriteLine(Repeat(1000000000, 10000000).Concat(Repeat(1, 90000000)).DoubleAverage()); 
        System.Console.WriteLine(Repeat(1, 90000000).Concat(Repeat(1000000000, 10000000)).DoubleAverage()); 
    }
}

Here we average with double arithmetic two series: one that is {a billion, a billion, a billion ... ten million times ... a billion, one, one one... ninety million times} and one that is the same sequence with the ones first and the billions last. If you run the code, you get different results. Not hugely different, but different, and the difference will become larger and larger the longer the sequences get. Long arithmetic is exact; double arithmetic potentially rounds off for every calculation and that means that massive error can accrue over time.

It seems very unexpected to do an operation solely on ints that results in an accumulation of floating point rounding error. That's the sort of thing one expected when doing an operation on floats, but not when doing it on ints.

回复收藏 0 原文

~没有更多了~