如何在 Numpy 中创建具有掩码值的数组的直方图?

发布于 2024-09-16 21:01:36 字数 258 浏览 8 评论 0原文

在 Numpy 1.4.1 中,计算掩码数组的直方图最简单或最有效的方法是什么?默认情况下,numpy.histogrampyplot.hist 会计算屏蔽元素!

我现在能想到的唯一简单的解决方案涉及使用非屏蔽值创建一个新数组:

histogram(m_arr[~m_arr.mask])

但这不是很有效,因为这不必要地创建一个新数组。我很乐意阅读更好的想法!

In Numpy 1.4.1, what is the simplest or most efficient way of calculating the histogram of a masked array? numpy.histogram and pyplot.hist do count the masked elements, by default!

The only simple solution I can think of right now involves creating a new array with the non-masked value:

histogram(m_arr[~m_arr.mask])

This is not very efficient, though, as this unnecessarily creates a new array. I'd be happy to read about better ideas!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

段念尘 2024-09-23 21:01:36

(根据上面的讨论取消删除它......)

我不确定 numpy 开发人员是否会认为这是一个错误或预期行为。我在邮件列表上询问,所以我想我们看看他们怎么说。

无论哪种方式,这都是一个简单的修复。修补numpy/lib/function_base.py以在函数的输入上使用numpy.asanyarray而不是numpy.asarray将允许它正确地使用屏蔽数组(或 ndarray 的任何其他子类)而不创建副本。

编辑:这似乎是预期的行为。 正如此处讨论的

如果你想忽略屏蔽数据,那就是
只是额外的函数调用

直方图(m_arr.compressed())

我不认为这使得
额外的副本将是相关的,
因为我猜全掩码数组
直方图内部的处理将是
贵很多。

使用 asanyarray 也允许
和其他子类型中的矩阵
可能无法正确处理
直方图计算。

除了掉落之外还有什么
屏蔽观察,这将是
有必要弄清楚什么是
直方图的屏蔽数组定义
正如布鲁斯指出的那样。

(Undeleting this as per discussion above...)

I'm not sure whether or not the numpy developers would consider this a bug or expected behavior. I asked on the mailing list, so I guess we'll see what they say.

Either way, it's an easy fix. Patching numpy/lib/function_base.py to use numpy.asanyarray rather than numpy.asarray on the inputs to the function will allow it to properly use masked arrays (or any other subclass of an ndarray) without creating a copy.

Edit: It seems like it is expected behavior. As discussed here:

If you want to ignore masked data it's
just on extra function call

histogram(m_arr.compressed())

I don't think the fact that this makes
an extra copy will be relevant,
because I guess full masked array
handling inside histogram will be a
lot more expensive.

Using asanyarray would also allow
matrices in and other subtypes that
might not be handled correctly by the
histogram calculations.

For anything else besides dropping
masked observations, it would be
necessary to figure out what the
masked array definition of a histogram
is, as Bruce pointed out.

堇年纸鸢 2024-09-23 21:01:36

尝试hist(m_arr.compressed())

Try hist(m_arr.compressed()).

眼藏柔 2024-09-23 21:01:36

这是一个超级老的问题,但现在我只是使用:

numpy.histogram(m_arr, bins=.., range=.., Density=False, Weights=m_arr_mask)

其中 m_arr_mask 是一个数组与 m_arr 具有相同的形状,由要从直方图中排除的 m_arr 元素的 0 值和要包含的元素的 1 值组成。

This is a super old question, but these days I just use:

numpy.histogram(m_arr, bins=.., range=.., density=False, weights=m_arr_mask)

Where m_arr_mask is an array with the same shape as m_arr, consisting of 0 values for elements of m_arr to be excluded from the histogram and 1 values for elements that are to be included.

对你的占有欲 2024-09-23 21:01:36

通过尝试 Erik 的解决方案遇到转换问题后(请参阅 https://github.com/numpy/numpy /issues/16616),我决定编写一个 numba 函数来实现此行为。

一些代码的灵感来自 https://numba.pydata .org/numba-examples/examples/densis_estimation/histogram/results.html。我添加了mask位。

import numpy
import numba  

@numba.jit(nopython=True)
def compute_bin(x, bin_edges):
    # assuming uniform bins for now
    n = bin_edges.shape[0] - 1
    a_min = bin_edges[0]
    a_max = bin_edges[-1]

    # special case to mirror NumPy behavior for last bin
    if x == a_max:
        return n - 1  # a_max always in last bin

    bin = int(n * (x - a_min) / (a_max - a_min))

    if bin < 0 or bin >= n:
        return None
    else:
        return bin


@numba.jit(nopython=True)
def masked_histogram(img, bin_edges, mask):
    hist = numpy.zeros(len(bin_edges) - 1, dtype=numpy.intp)

    for i, value in enumerate(img.flat):
        if mask.flat[i]:
            bin = compute_bin(value, bin_edges)
            if bin is not None:
                hist[int(bin)] += 1
    return hist  # , bin_edges

加速是显着的。在 (1000, 1000) 图像上:

在此处输入图像描述

After running into casting issues by trying Erik's solution (see https://github.com/numpy/numpy/issues/16616), I decided to write a numba function to achieve this behavior.

Some of the code was inspired by https://numba.pydata.org/numba-examples/examples/density_estimation/histogram/results.html. I added the mask bit.

import numpy
import numba  

@numba.jit(nopython=True)
def compute_bin(x, bin_edges):
    # assuming uniform bins for now
    n = bin_edges.shape[0] - 1
    a_min = bin_edges[0]
    a_max = bin_edges[-1]

    # special case to mirror NumPy behavior for last bin
    if x == a_max:
        return n - 1  # a_max always in last bin

    bin = int(n * (x - a_min) / (a_max - a_min))

    if bin < 0 or bin >= n:
        return None
    else:
        return bin


@numba.jit(nopython=True)
def masked_histogram(img, bin_edges, mask):
    hist = numpy.zeros(len(bin_edges) - 1, dtype=numpy.intp)

    for i, value in enumerate(img.flat):
        if mask.flat[i]:
            bin = compute_bin(value, bin_edges)
            if bin is not None:
                hist[int(bin)] += 1
    return hist  # , bin_edges

The speedup is significant. On a (1000, 1000) image:

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文