在 python 中计算生成器的平均值

发布于 2024-10-16 15:15:43 字数 660 浏览 8 评论 0原文

我正在做一些统计工作，我有一个（大）随机数集合来计算平均值，我想使用生成器，因为我只需要计算平均值，所以我不需要存储数字。

问题是，如果将生成器传递给 numpy.mean，它就会中断。我可以编写一个简单的函数来执行我想要的操作，但我想知道是否有适当的内置方法来执行此操作？

如果我可以说“sum(values)/len(values)”，那就太好了，但是 len 对生成器不起作用，并且对已经消耗的值求和。

这是一个例子：

import numpy 

def my_mean(values):
    n = 0
    Sum = 0.0
    try:
        while True:
            Sum += next(values)
            n += 1
    except StopIteration: pass
    return float(Sum)/n

X = [k for k in range(1,7)]
Y = (k for k in range(1,7))

print numpy.mean(X)
print my_mean(Y)

它们都给出了相同的正确答案，buy my_mean 不适用于列表，而 numpy.mean 不适用于生成器。

我真的很喜欢使用生成器的想法，但这样的细节似乎会破坏事情。

原文

I'm doing some statistics work, I have a (large) collection of random numbers to compute the mean of, I'd like to work with generators, because I just need to compute the mean, so I don't need to store the numbers.

The problem is that numpy.mean breaks if you pass it a generator. I can write a simple function to do what I want, but I'm wondering if there's a proper, built-in way to do this?

It would be nice if I could say "sum(values)/len(values)", but len doesn't work for genetators, and sum already consumed values.

here's an example:

import numpy 

def my_mean(values):
    n = 0
    Sum = 0.0
    try:
        while True:
            Sum += next(values)
            n += 1
    except StopIteration: pass
    return float(Sum)/n

X = [k for k in range(1,7)]
Y = (k for k in range(1,7))

print numpy.mean(X)
print my_mean(Y)

these both give the same, correct, answer, buy my_mean doesn't work for lists, and numpy.mean doesn't work for generators.

I really like the idea of working with generators, but details like this seem to spoil things.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沫雨熙 2024-10-23 15:15:43

一般来说，如果您正在进行浮点数的流式平均计算，那么您可能最好使用数值更稳定的算法，而不是简单地对生成器求和并除以长度。

其中最简单的（据我所知）通常是归功于 Knuth，并且还计算方差。该链接包含一个 python 实现，但为了完整起见，仅将平均部分复制到此处。

def mean(data):
    n = 0
    mean = 0.0
 
    for x in data:
        n += 1
        mean += (x - mean)/n

    if n < 1:
        return float('nan')
    else:
        return mean

我知道这个问题已经很老了，但它仍然是谷歌上的第一个点击，所以发布似乎很合适。我仍然对 python 标准库不包含这段简单的代码感到难过。

In general if you're doing a streaming mean calculation of floating point numbers, you're probably better off using a more numerically stable algorithm than simply summing the generator and dividing by the length.

The simplest of these (that I know) is usually credited to Knuth, and also calculates variance. The link contains a python implementation, but just the mean portion is copied here for completeness.

def mean(data):
    n = 0
    mean = 0.0
 
    for x in data:
        n += 1
        mean += (x - mean)/n

    if n < 1:
        return float('nan')
    else:
        return mean

I know this question is super old, but it's still the first hit on google, so it seemed appropriate to post. I'm still sad that the python standard library doesn't contain this simple piece of code.

回复收藏 0 原文

沙沙粒小 2024-10-23 15:15:43

只需对代码进行一项简单的更改即可让您同时使用两者。生成器旨在可互换地用于 for 循环中的列表。

def my_mean(values):
    n = 0
    Sum = 0.0
    for v in values:
        Sum += v
        n += 1
    return Sum / n

Just one simple change to your code would let you use both. Generators were meant to be used interchangeably to lists in a for-loop.

def my_mean(values):
    n = 0
    Sum = 0.0
    for v in values:
        Sum += v
        n += 1
    return Sum / n

回复收藏 0 原文

沒落の蓅哖 2024-10-23 15:15:43

def my_mean(values):
    total = 0
    for n, v in enumerate(values, 1):
        total += v
    return total / n

print my_mean(X)
print my_mean(Y)

Python 3.4 中有 statistics.mean() 但它调用list( ) 输入上的：

def mean(data):
    if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 1:
        raise StatisticsError('mean requires at least one data point')
    return _sum(data)/n

其中 _sum() 返回准确的总和 (math.fsum() 类似函数，除了 float 之外还支持 Fraction< /code>，十进制）。

def my_mean(values):
    total = 0
    for n, v in enumerate(values, 1):
        total += v
    return total / n

print my_mean(X)
print my_mean(Y)

There is statistics.mean() in Python 3.4 but it calls list() on the input:

def mean(data):
    if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 1:
        raise StatisticsError('mean requires at least one data point')
    return _sum(data)/n

where _sum() returns an accurate sum (math.fsum()-like function that in addition to float also supports Fraction, Decimal).

回复收藏 0 原文

晚雾 2024-10-23 15:15:43

老式的方法来做到这一点：

def my_mean(values):
   sum, n = 0, 0
   for x in values:
      sum += x
      n += 1
   return float(sum)/n

The old-fashioned way to do it:

def my_mean(values):
   sum, n = 0, 0
   for x in values:
      sum += x
      n += 1
   return float(sum)/n

回复收藏 0 原文

冬天旳寂寞 2024-10-23 15:15:43

一种方法是

numpy.fromiter(Y, int).mean()

，但这实际上暂时存储了数字。

One way would be

numpy.fromiter(Y, int).mean()

but this actually temporarily stores the numbers.

回复收藏 0 原文

断爱 2024-10-23 15:15:43

您的方法是一个很好的方法，但您应该使用 for x in y 习惯用法，而不是重复调用 next 直到获得 StopIteration。这适用于列表和生成器：

def my_mean(values):
    n = 0
    Sum = 0.0

    for value in values:
        Sum += value
        n += 1
    return float(Sum)/n

Your approach is a good one, but you should instead use the for x in y idiom instead of repeatedly calling next until you get a StopIteration. This works for both lists and generators:

def my_mean(values):
    n = 0
    Sum = 0.0

    for value in values:
        Sum += value
        n += 1
    return float(Sum)/n

回复收藏 0 原文

心房敞 2024-10-23 15:15:43

您可以在不知道数组大小的情况下使用reduce：

from itertools import izip, count
reduce(lambda c,i: (c*(i[1]-1) + float(i[0]))/i[1], izip(values,count(1)),0)

You can use reduce without knowing the size of the array:

from itertools import izip, count
reduce(lambda c,i: (c*(i[1]-1) + float(i[0]))/i[1], izip(values,count(1)),0)

回复收藏 0 原文

离线来电— 2024-10-23 15:15:43

def my_mean(values):
    n = 0
    sum = 0
    for v in values:
        sum += v
        n += 1
    return sum/n

上面的代码与您的代码非常相似，除了使用 for 来迭代 values ，无论您获得列表还是迭代器，您都可以。
然而，python sum 方法非常优化，因此除非列表真的非常长，否则您可能会更乐意暂时存储数据。

（另请注意，由于您使用的是 python3，因此不需要 float(sum)/n）

def my_mean(values):
    n = 0
    sum = 0
    for v in values:
        sum += v
        n += 1
    return sum/n

The above is very similar to your code, except by using for to iterate values you are good no matter if you get a list or an iterator.
The python sum method is however very optimized, so unless the list is really, really long, you might be more happy temporarily storing the data.

(Also notice that since you are using python3, you don't need float(sum)/n)

回复收藏 0 原文

溺孤伤于心 2024-10-23 15:15:43

如果您提前知道生成器的长度并且希望避免将完整列表存储在内存中，您可以使用：

reduce(np.add, generator)/length

If you know the length of the generator in advance and you want to avoid storing the full list in memory, you can use:

reduce(np.add, generator)/length

回复收藏 0 原文

寂寞花火° 2024-10-23 15:15:43

尝试：

import itertools

def mean(i):
    (i1, i2) = itertools.tee(i, 2)
    return sum(i1) / sum(1 for _ in i2)

print mean([1,2,3,4,5])

tee将为任何可迭代的i（例如生成器、列表等）复制迭代器，允许您使用一个副本进行求和，另一个进行计数。

（请注意，“tee”仍将使用中间存储）。

Try:

import itertools

def mean(i):
    (i1, i2) = itertools.tee(i, 2)
    return sum(i1) / sum(1 for _ in i2)

print mean([1,2,3,4,5])

tee will duplicate your iterator for any iterable i (e.g. a generator, a list, etc.), allowing you to use one duplicate for summing and the other for counting.

(Note that 'tee' will still use intermediate storage).

回复收藏 0 原文

~没有更多了~