如何在 Numpy 中创建具有掩码值的数组的直方图？

发布于 2024-09-16 21:01:36 字数 258 浏览 8 评论 0原文

在 Numpy 1.4.1 中，计算掩码数组的直方图最简单或最有效的方法是什么？默认情况下，numpy.histogram 和 pyplot.hist 会计算屏蔽元素！

我现在能想到的唯一简单的解决方案涉及使用非屏蔽值创建一个新数组：

histogram(m_arr[~m_arr.mask])

但这不是很有效，因为这不必要地创建一个新数组。我很乐意阅读更好的想法！

原文

In Numpy 1.4.1, what is the simplest or most efficient way of calculating the histogram of a masked array? numpy.histogram and pyplot.hist do count the masked elements, by default!

The only simple solution I can think of right now involves creating a new array with the non-masked value:

histogram(m_arr[~m_arr.mask])

This is not very efficient, though, as this unnecessarily creates a new array. I'd be happy to read about better ideas!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

段念尘 2024-09-23 21:01:36

（根据上面的讨论取消删除它......）

我不确定 numpy 开发人员是否会认为这是一个错误或预期行为。我在邮件列表上询问，所以我想我们看看他们怎么说。

无论哪种方式，这都是一个简单的修复。修补numpy/lib/function_base.py以在函数的输入上使用numpy.asanyarray而不是numpy.asarray将允许它正确地使用屏蔽数组（或 ndarray 的任何其他子类）而不创建副本。

编辑：这似乎是预期的行为。正如此处讨论的：

如果你想忽略屏蔽数据，那就是
只是额外的函数调用
直方图(m_arr.compressed())
我不认为这使得
额外的副本将是相关的，
因为我猜全掩码数组
直方图内部的处理将是
贵很多。
使用 asanyarray 也允许
和其他子类型中的矩阵
可能无法正确处理
直方图计算。
除了掉落之外还有什么
屏蔽观察，这将是
有必要弄清楚什么是
直方图的屏蔽数组定义
正如布鲁斯指出的那样。

回复收藏 0 原文

堇年纸鸢 2024-09-23 21:01:36

尝试hist(m_arr.compressed())。

回复收藏 0 原文

眼藏柔 2024-09-23 21:01:36

这是一个超级老的问题，但现在我只是使用：

numpy.histogram(m_arr, bins=.., range=.., Density=False, Weights=m_arr_mask)

其中 m_arr_mask 是一个数组与 m_arr 具有相同的形状，由要从直方图中排除的 m_arr 元素的 0 值和要包含的元素的 1 值组成。

回复收藏 0 原文

对你的占有欲 2024-09-23 21:01:36

通过尝试 Erik 的解决方案遇到转换问题后（请参阅 https://github.com/numpy/numpy /issues/16616），我决定编写一个 numba 函数来实现此行为。

一些代码的灵感来自 https://numba.pydata .org/numba-examples/examples/densis_estimation/histogram/results.html。我添加了mask位。

import numpy
import numba  

@numba.jit(nopython=True)
def compute_bin(x, bin_edges):
    # assuming uniform bins for now
    n = bin_edges.shape[0] - 1
    a_min = bin_edges[0]
    a_max = bin_edges[-1]

    # special case to mirror NumPy behavior for last bin
    if x == a_max:
        return n - 1  # a_max always in last bin

    bin = int(n * (x - a_min) / (a_max - a_min))

    if bin < 0 or bin >= n:
        return None
    else:
        return bin


@numba.jit(nopython=True)
def masked_histogram(img, bin_edges, mask):
    hist = numpy.zeros(len(bin_edges) - 1, dtype=numpy.intp)

    for i, value in enumerate(img.flat):
        if mask.flat[i]:
            bin = compute_bin(value, bin_edges)
            if bin is not None:
                hist[int(bin)] += 1
    return hist  # , bin_edges

加速是显着的。在 (1000, 1000) 图像上：

After running into casting issues by trying Erik's solution (see https://github.com/numpy/numpy/issues/16616), I decided to write a numba function to achieve this behavior.

Some of the code was inspired by https://numba.pydata.org/numba-examples/examples/density_estimation/histogram/results.html. I added the mask bit.

import numpy
import numba  

@numba.jit(nopython=True)
def compute_bin(x, bin_edges):
    # assuming uniform bins for now
    n = bin_edges.shape[0] - 1
    a_min = bin_edges[0]
    a_max = bin_edges[-1]

    # special case to mirror NumPy behavior for last bin
    if x == a_max:
        return n - 1  # a_max always in last bin

    bin = int(n * (x - a_min) / (a_max - a_min))

    if bin < 0 or bin >= n:
        return None
    else:
        return bin


@numba.jit(nopython=True)
def masked_histogram(img, bin_edges, mask):
    hist = numpy.zeros(len(bin_edges) - 1, dtype=numpy.intp)

    for i, value in enumerate(img.flat):
        if mask.flat[i]:
            bin = compute_bin(value, bin_edges)
            if bin is not None:
                hist[int(bin)] += 1
    return hist  # , bin_edges

The speedup is significant. On a (1000, 1000) image: