numpy.digitize 返回的值超出范围?

发布于 2024-10-05 15:36:40 字数 186 浏览 9 评论 0原文

我使用以下代码将数组数字化为 16 个 bin:

numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1])

我期望输出在 [1, 16] 范围内,因为有 16 个 bin。然而,返回数组中的一个值是 17。这该如何解释呢?

I am using the following code to digitize an array into 16 bins:

numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1])

I expect that the output is in the range [1, 16], since there are 16 bins. However, one of the values in the returned array is 17. How can this be explained?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

少女的英雄梦 2024-10-12 15:36:40

这实际上是 numpy.digitize() 的记录行为:

返回的每个索引i都满足bins[i-1] <= x << bins[i] 如果
bins 是单调递增的,或者 bins[i-1] > > x >= bins[i] 如果
bins 单调递减。如果x中的值超出了
根据需要返回 bins0len(bins) 的边界。

因此,在您的情况下, 017 也是有效的返回值(请注意, numpy.histogram() 返回的 bin 数组的长度 <代码>17)。 numpy.histogram() 返回的 bin 涵盖 array.min()array.max() 的范围。文档中给出的条件显示 array.min() 属于第一个 bin,而 array.max() 位于最后一个 bin 之外 - 这就是为什么 0 不在输出中,而 17 则在输出中。

This is actually documented behaviour of numpy.digitize():

Each index i returned is such that bins[i-1] <= x < bins[i] if
bins is monotonically increasing, or bins[i-1] > x >= bins[i] if
bins is monotonically decreasing. If values in x are beyond the
bounds of bins, 0 or len(bins) is returned as appropriate.

So in your case, 0 and 17 are also valid return values (note that the bin array returned by numpy.histogram() has length 17). The bins returned by numpy.histogram() cover the range array.min() to array.max(). The condition given in the docs shows that array.min() belongs to the first bin, while array.max() lies outside the last bin -- that's why 0 is not in the output, while 17 is.

东北女汉子 2024-10-12 15:36:40

numpy.histogram() 生成 bin边缘 的数组,其中有 (bin 数量)+1

numpy.histogram() produces an array of the bin edges, of which there are (number of bins)+1.

才能让你更想念 2024-10-12 15:36:40

在 numpy 版本 1.8 中,您可以选择是否希望 numpy.digitize 考虑关闭或打开区间。
以下是一个示例(从 http://docs.scipy 复制。 org/doc/numpy/reference/ generated/numpy.digitize.html)

x = np.array([1.2, 10.0, 12.4, 15.5, 20.])

bins = np.array([0,5, 10,15,20])

np.digitize(x,bins,right=True)

array([1, 2, 3, 4, 4])

In numpy version 1.8.,you have an option to select whether you want numpy.digitize to consider the interval to be closed or open.
Following is an example (copied from http://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html)

x = np.array([1.2, 10.0, 12.4, 15.5, 20.])

bins = np.array([0,5,10,15,20])

np.digitize(x,bins,right=True)

array([1, 2, 3, 4, 4])

转角预定愛 2024-10-12 15:36:40

好的,我找到了一个用 numpy 离散化数组的方法。
问题是,np.histogram_bin_edges(因此,np.histogram)和np.digitize在使用bin边缘的方式上不一致:前2个总是返回一个额外的边缘,无论你在np.digitize中使用什么正确的模式,它总是留下一个“异常值”类别。
所要做的是(假设边缘按升序出现)

bin_edges=np.histogram_bin_edges(arr,bins=4) #or any other source
if bin_edges[0] <= arr.min():
 categorized_arr=np.digitize(arr,bins=bin_edges[1:],right=True)
elif bin_edges[-1] >= arr.max():
 categorized_arr=np.digitize(arr,bins=bin_edges[:-1],right=False)

Ok, I found a recipe to discretize an array with numpy.
Problem is, np.histogram_bin_edges (and, therefore, np.histogram) and np.digitize are not consistent in how they use bins edges: first 2 always return an extra edge, what ever right mode you use in np.digitize, which always leaves you with one "outlier" category.
What one has to do is (assuming edges appear in ascending order)

bin_edges=np.histogram_bin_edges(arr,bins=4) #or any other source
if bin_edges[0] <= arr.min():
 categorized_arr=np.digitize(arr,bins=bin_edges[1:],right=True)
elif bin_edges[-1] >= arr.max():
 categorized_arr=np.digitize(arr,bins=bin_edges[:-1],right=False)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文