numpy.digitize 返回的值超出范围?
我使用以下代码将数组数字化为 16 个 bin:
numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1])
我期望输出在 [1, 16] 范围内,因为有 16 个 bin。然而,返回数组中的一个值是 17。这该如何解释呢?
I am using the following code to digitize an array into 16 bins:
numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1])
I expect that the output is in the range [1, 16], since there are 16 bins. However, one of the values in the returned array is 17. How can this be explained?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这实际上是 numpy.digitize() 的记录行为:
因此,在您的情况下,
0
和17
也是有效的返回值(请注意,numpy.histogram()
返回的 bin 数组的长度 <代码>17)。numpy.histogram()
返回的 bin 涵盖array.min()
到array.max()
的范围。文档中给出的条件显示array.min()
属于第一个 bin,而array.max()
位于最后一个 bin 之外 - 这就是为什么0
不在输出中,而 17 则在输出中。This is actually documented behaviour of
numpy.digitize()
:So in your case,
0
and17
are also valid return values (note that the bin array returned bynumpy.histogram()
has length17
). The bins returned bynumpy.histogram()
cover the rangearray.min()
toarray.max()
. The condition given in the docs shows thatarray.min()
belongs to the first bin, whilearray.max()
lies outside the last bin -- that's why0
is not in the output, while 17 is.numpy.histogram()
生成 bin边缘 的数组,其中有(bin 数量)+1
。numpy.histogram()
produces an array of the bin edges, of which there are(number of bins)+1
.在 numpy 版本 1.8 中,您可以选择是否希望 numpy.digitize 考虑关闭或打开区间。
以下是一个示例(从 http://docs.scipy 复制。 org/doc/numpy/reference/ generated/numpy.digitize.html)
x = np.array([1.2, 10.0, 12.4, 15.5, 20.])
bins = np.array([0,5, 10,15,20])
np.digitize(x,bins,right=True)
array([1, 2, 3, 4, 4])
In numpy version 1.8.,you have an option to select whether you want numpy.digitize to consider the interval to be closed or open.
Following is an example (copied from http://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html)
x = np.array([1.2, 10.0, 12.4, 15.5, 20.])
bins = np.array([0,5,10,15,20])
np.digitize(x,bins,right=True)
array([1, 2, 3, 4, 4])
好的,我找到了一个用 numpy 离散化数组的方法。
问题是,np.histogram_bin_edges(因此,np.histogram)和np.digitize在使用bin边缘的方式上不一致:前2个总是返回一个额外的边缘,无论你在np.digitize中使用什么正确的模式,它总是留下一个“异常值”类别。
所要做的是(假设边缘按升序出现)
Ok, I found a recipe to discretize an array with numpy.
Problem is, np.histogram_bin_edges (and, therefore, np.histogram) and np.digitize are not consistent in how they use bins edges: first 2 always return an extra edge, what ever right mode you use in np.digitize, which always leaves you with one "outlier" category.
What one has to do is (assuming edges appear in ascending order)