NumPy:计算删除 NaN 的平均值
如何沿着矩阵计算矩阵平均值,但要从计算中删除 nan 值? (对于 R 语言的人,请考虑 na.rm = TRUE
)。
这是我的[非]工作示例:
import numpy as np
dat = np.array([[1, 2, 3],
[4, 5, np.nan],
[np.nan, 6, np.nan],
[np.nan, np.nan, np.nan]])
print(dat)
print(dat.mean(1)) # [ 2. nan nan nan]
删除 NaN 后,我的预期输出将是:
array([ 2., 4.5, 6., nan])
How can I calculate matrix mean values along a matrix, but to remove nan
values from calculation? (For R people, think na.rm = TRUE
).
Here is my [non-]working example:
import numpy as np
dat = np.array([[1, 2, 3],
[4, 5, np.nan],
[np.nan, 6, np.nan],
[np.nan, np.nan, np.nan]])
print(dat)
print(dat.mean(1)) # [ 2. nan nan nan]
With NaNs removed, my expected output would be:
array([ 2., 4.5, 6., nan])
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
我认为你想要的是一个屏蔽数组:
编辑:组合所有计时数据
返回:
I think what you want is a masked array:
Edit: Combining all of the timing data
Returns:
如果性能很重要,您应该使用
bottleneck.nanmean()
代替:http:// pypi.python.org/pypi/Bottleneck
If performance matters, you should use
bottleneck.nanmean()
instead:http://pypi.python.org/pypi/Bottleneck
从 numpy 1.8(2013-10-30 发布)开始,
nanmean
正是您所需要的:From numpy 1.8 (released 2013-10-30) onwards,
nanmean
does precisely what you need:假设您还安装了 SciPy:
http://www. scipy.org/doc/api_docs/SciPy.stats.stats.html#nanmean
Assuming you've also got SciPy installed:
http://www.scipy.org/doc/api_docs/SciPy.stats.stats.html#nanmean
也可以动态创建过滤掉 nan 的屏蔽数组:
A masked array with the nans filtered out can also be created on the fly:
您总是可以找到类似的解决方法:
Numpy 2.0 的
numpy.mean
有一个skipna
选项来解决这个问题。You can always find a workaround in something like:
Numpy 2.0's
numpy.mean
has askipna
option which should take care of that.这是建立在 JoshAdel 建议的解决方案之上的。
定义以下函数:
使用示例:
将打印出:
This is built upon the solution suggested by JoshAdel.
Define the following function:
Example use:
Will print out:
使用 Pandas 来做到这一点怎么样:
给出:
How about using Pandas to do this:
Gives:
或者您使用新上传的 laxarray,它是屏蔽数组的包装器之一。
遵循 JoshAdel 的协议我得到:
所以 laxarray 稍微慢一些(需要检查原因,也许可以修复),但更容易使用并允许用字符串标记维度。
查看: https://github.com/perrette/laxarray
编辑:我已经检查了另一个模块, “la”,拉里,它击败了所有测试:
令人印象深刻!
Or you use laxarray, freshly uploaded, which is among other a wrapper for masked arrays.
following JoshAdel's protocoll I get:
So laxarray is marginally slower (would need to check why, maybe fixable), but much easier to use and allow labelling dimensions with strings.
check out: https://github.com/perrette/laxarray
EDIT: I have checked with another module, "la", larry, which beats all tests:
Impressive !
对所有建议的方法再进行一次速度检查:
所以最好的是“bottleneck.nanmean(dat, axis=1)”
“scipy.stats.nanmean(dat)”并不比 numpy.nanmean(dat, axis=1) 更快。
One more speed check for all proposed approaches:
So the best is 'bottleneck.nanmean(dat, axis=1)'
'scipy.stats.nanmean(dat)' is not faster then
numpy.nanmean(dat, axis=1)
.