在 python 中计算生成器的平均值
我正在做一些统计工作,我有一个(大)随机数集合来计算平均值,我想使用生成器,因为我只需要计算平均值,所以我不需要存储数字。
问题是,如果将生成器传递给 numpy.mean,它就会中断。我可以编写一个简单的函数来执行我想要的操作,但我想知道是否有适当的内置方法来执行此操作?
如果我可以说“sum(values)/len(values)”,那就太好了,但是 len 对生成器不起作用,并且对已经消耗的值求和。
这是一个例子:
import numpy
def my_mean(values):
n = 0
Sum = 0.0
try:
while True:
Sum += next(values)
n += 1
except StopIteration: pass
return float(Sum)/n
X = [k for k in range(1,7)]
Y = (k for k in range(1,7))
print numpy.mean(X)
print my_mean(Y)
它们都给出了相同的正确答案,buy my_mean 不适用于列表,而 numpy.mean 不适用于生成器。
我真的很喜欢使用生成器的想法,但这样的细节似乎会破坏事情。
I'm doing some statistics work, I have a (large) collection of random numbers to compute the mean of, I'd like to work with generators, because I just need to compute the mean, so I don't need to store the numbers.
The problem is that numpy.mean breaks if you pass it a generator. I can write a simple function to do what I want, but I'm wondering if there's a proper, built-in way to do this?
It would be nice if I could say "sum(values)/len(values)", but len doesn't work for genetators, and sum already consumed values.
here's an example:
import numpy
def my_mean(values):
n = 0
Sum = 0.0
try:
while True:
Sum += next(values)
n += 1
except StopIteration: pass
return float(Sum)/n
X = [k for k in range(1,7)]
Y = (k for k in range(1,7))
print numpy.mean(X)
print my_mean(Y)
these both give the same, correct, answer, buy my_mean doesn't work for lists, and numpy.mean doesn't work for generators.
I really like the idea of working with generators, but details like this seem to spoil things.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
一般来说,如果您正在进行浮点数的流式平均计算,那么您可能最好使用数值更稳定的算法,而不是简单地对生成器求和并除以长度。
其中最简单的(据我所知)通常是归功于 Knuth,并且还计算方差。该链接包含一个 python 实现,但为了完整起见,仅将平均部分复制到此处。
我知道这个问题已经很老了,但它仍然是谷歌上的第一个点击,所以发布似乎很合适。我仍然对 python 标准库不包含这段简单的代码感到难过。
In general if you're doing a streaming mean calculation of floating point numbers, you're probably better off using a more numerically stable algorithm than simply summing the generator and dividing by the length.
The simplest of these (that I know) is usually credited to Knuth, and also calculates variance. The link contains a python implementation, but just the mean portion is copied here for completeness.
I know this question is super old, but it's still the first hit on google, so it seemed appropriate to post. I'm still sad that the python standard library doesn't contain this simple piece of code.
只需对代码进行一项简单的更改即可让您同时使用两者。生成器旨在可互换地用于 for 循环中的列表。
Just one simple change to your code would let you use both. Generators were meant to be used interchangeably to lists in a for-loop.
Python 3.4 中有
statistics.mean()
但它调用list( )
输入上的:其中
_sum()
返回准确的总和 (math.fsum()
类似函数,除了float
之外还支持Fraction< /code>,
十进制
)。There is
statistics.mean()
in Python 3.4 but it callslist()
on the input:where
_sum()
returns an accurate sum (math.fsum()
-like function that in addition tofloat
also supportsFraction
,Decimal
).老式的方法来做到这一点:
The old-fashioned way to do it:
一种方法是
,但这实际上暂时存储了数字。
One way would be
but this actually temporarily stores the numbers.
您的方法是一个很好的方法,但您应该使用
for x in y
习惯用法,而不是重复调用next
直到获得StopIteration
。这适用于列表和生成器:Your approach is a good one, but you should instead use the
for x in y
idiom instead of repeatedly callingnext
until you get aStopIteration
. This works for both lists and generators:您可以在不知道数组大小的情况下使用reduce:
You can use reduce without knowing the size of the array:
上面的代码与您的代码非常相似,除了使用
for
来迭代values
,无论您获得列表还是迭代器,您都可以。然而,python
sum
方法非常优化,因此除非列表真的非常长,否则您可能会更乐意暂时存储数据。(另请注意,由于您使用的是 python3,因此不需要
float(sum)/n
)The above is very similar to your code, except by using
for
to iteratevalues
you are good no matter if you get a list or an iterator.The python
sum
method is however very optimized, so unless the list is really, really long, you might be more happy temporarily storing the data.(Also notice that since you are using python3, you don't need
float(sum)/n
)如果您提前知道生成器的长度并且希望避免将完整列表存储在内存中,您可以使用:
If you know the length of the generator in advance and you want to avoid storing the full list in memory, you can use:
尝试:
tee
将为任何可迭代的i
(例如生成器、列表等)复制迭代器,允许您使用一个副本进行求和,另一个进行计数。(请注意,“tee”仍将使用中间存储)。
Try:
tee
will duplicate your iterator for any iterablei
(e.g. a generator, a list, etc.), allowing you to use one duplicate for summing and the other for counting.(Note that 'tee' will still use intermediate storage).