Matplotlib 中的 bin 大小(直方图)

发布于 2024-11-28 22:20:24 字数 64 浏览 1 评论 0原文

我正在使用 matplotlib 制作直方图。

有没有办法手动设置垃圾箱的大小而不是垃圾箱的数量?

I'm using matplotlib to make a histogram.

Is there any way to manually set the size of the bins as opposed to the number of bins?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

安静被遗忘 2024-12-05 22:20:24

实际上,这很简单:您可以提供带有垃圾箱边界的列表,而不是垃圾箱的数量。它们也可以不均匀分布:

plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])

如果您只是希望它们均匀分布,您可以简单地使用范围:

plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))

添加到原始答案

上面的行适用于仅填充整数的数据 。正如 macrocosme 指出的,对于浮点数,您可以使用:

import numpy as np
plt.hist(data, bins=np.arange(min(data), max(data) + binwidth, binwidth))

Actually, it's quite easy: instead of the number of bins you can give a list with the bin boundaries. They can be unequally distributed, too:

plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])

If you just want them equally distributed, you can simply use range:

plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))

Added to original answer

The above line works for data filled with integers only. As macrocosme points out, for floats you can use:

import numpy as np
plt.hist(data, bins=np.arange(min(data), max(data) + binwidth, binwidth))
葮薆情 2024-12-05 22:20:24

对于 N 个箱,箱边缘由 N+1 个值的列表指定,其中前 N 给出箱的下边缘,+1 给出最后一个箱的上边缘。

代码:

from numpy import np; from pylab import *

bin_size = 0.1; min_edge = 0; max_edge = 2.5
N = (max_edge-min_edge)/bin_size; Nplus1 = N + 1
bin_list = np.linspace(min_edge, max_edge, Nplus1)

请注意,linspace 生成从 min_edge 到 max_edge 的数组,分为 N+1 个值或 N 个 bin

For N bins, the bin edges are specified by list of N+1 values where the first N give the lower bin edges and the +1 gives the upper edge of the last bin.

Code:

from numpy import np; from pylab import *

bin_size = 0.1; min_edge = 0; max_edge = 2.5
N = (max_edge-min_edge)/bin_size; Nplus1 = N + 1
bin_list = np.linspace(min_edge, max_edge, Nplus1)

Note that linspace produces array from min_edge to max_edge broken into N+1 values or N bins

待天淡蓝洁白时 2024-12-05 22:20:24

我使用分位数来统一垃圾箱并适合样本:

bins=df['Generosity'].quantile([0,.05,0.1,0.15,0.20,0.25,0.3,0.35,0.40,0.45,0.5,0.55,0.6,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1]).to_list()

plt.hist(df['Generosity'], bins=bins, normed=True, alpha=0.5, histtype='stepfilled', color='steelblue', edgecolor='none')

在此处输入图像描述

I use quantiles to do bins uniform and fitted to sample:

bins=df['Generosity'].quantile([0,.05,0.1,0.15,0.20,0.25,0.3,0.35,0.40,0.45,0.5,0.55,0.6,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1]).to_list()

plt.hist(df['Generosity'], bins=bins, normed=True, alpha=0.5, histtype='stepfilled', color='steelblue', edgecolor='none')

enter image description here

寄离 2024-12-05 22:20:24

我想最简单的方法是计算您拥有的数据的最小值和最大值,然后计算L = max - min。然后,将 L 除以所需的 bin 宽度(我假设这就是您所说的 bin 大小),并使用该值的上限作为 bin 的数量。

I guess the easy way would be to calculate the minimum and maximum of the data you have, then calculate L = max - min. Then you divide L by the desired bin width (I'm assuming this is what you mean by bin size) and use the ceiling of this value as the number of bins.

白芷 2024-12-05 22:20:24

我和OP有同样的问题(我想!),但我无法让它按照Lastalda指定的方式工作。我不知道我是否正确解释了这个问题,但我找到了另一种解决方案(尽管这可能是一种非常糟糕的方法)。

这就是我这样做的方式:

plt.hist([1,11,21,31,41], bins=[0,10,20,30,40,50],weights=[10,1 ,40,33,6]);

创建此:

显示在 matplotlib 中创建的直方图的图像

所以第一个参数基本上“初始化”了 bin - 我专门创建了一个介于 bins 参数中设置的范围之间的数字。

为了演示这一点,请查看第一个参数中的数组 ([1,11,21,31,41]) 和第二个参数中的“bins”数组 ([0,10,20,30,40,50]) :

  • 数字 1(来自第一个数组)介于 0 和 10 之间(在“bins”数组中)
  • 数字 11(来自第一个数组)介于 11 和 20 之间(在“bins”数组中) 'bins' 数组)
  • 数字 21(来自第一个数组)介于 21 和 30(在 'bins' 数组中)之间,依此类推。

然后我使用 'weights' 参数来定义每个 bin 的大小。这是用于权重参数的数组:[10,1,40,33,6]。

因此,0 到 10 bin 的值为 10,11 到 20 bin 的值为 1,21 到 30 bin 的值为 40,等等。

I had the same issue as OP (I think!), but I couldn't get it to work in the way that Lastalda specified. I don't know if I have interpreted the question properly, but I have found another solution (it probably is a really bad way of doing it though).

This was the way that I did it:

plt.hist([1,11,21,31,41], bins=[0,10,20,30,40,50], weights=[10,1,40,33,6]);

Which creates this:

image showing histogram graph created in matplotlib

So the first parameter basically 'initialises' the bin - I'm specifically creating a number that is in between the range I set in the bins parameter.

To demonstrate this, look at the array in the first parameter ([1,11,21,31,41]) and the 'bins' array in the second parameter ([0,10,20,30,40,50]):

  • The number 1 (from the first array) falls between 0 and 10 (in the 'bins' array)
  • The number 11 (from the first array) falls between 11 and 20 (in the 'bins' array)
  • The number 21 (from the first array) falls between 21 and 30 (in the 'bins' array), etc.

Then I'm using the 'weights' parameter to define the size of each bin. This is the array used for the weights parameter: [10,1,40,33,6].

So the 0 to 10 bin is given the value 10, the 11 to 20 bin is given the value of 1, the 21 to 30 bin is given the value of 40, etc.

指尖微凉心微凉 2024-12-05 22:20:24

我喜欢事情自动发生,并且让垃圾箱落在“好的”值上。以下似乎工作得很好。

import numpy as np
import numpy.random as random
import matplotlib.pyplot as plt
def compute_histogram_bins(data, desired_bin_size):
    min_val = np.min(data)
    max_val = np.max(data)
    min_boundary = -1.0 * (min_val % desired_bin_size - min_val)
    max_boundary = max_val - max_val % desired_bin_size + desired_bin_size
    n_bins = int((max_boundary - min_boundary) / desired_bin_size) + 1
    bins = np.linspace(min_boundary, max_boundary, n_bins)
    return bins

if __name__ == '__main__':
    data = np.random.random_sample(100) * 123.34 - 67.23
    bins = compute_histogram_bins(data, 10.0)
    print(bins)
    plt.hist(data, bins=bins)
    plt.xlabel('Value')
    plt.ylabel('Counts')
    plt.title('Compute Bins Example')
    plt.grid(True)
    plt.show()

结果的箱大小间隔很好。

[-70. -60. -50. -40. -30. -20. -10.   0.  10.  20.  30.  40.  50.  60.]

计算的 bins 直方图

I like things to happen automatically and for bins to fall on "nice" values. The following seems to work quite well.

import numpy as np
import numpy.random as random
import matplotlib.pyplot as plt
def compute_histogram_bins(data, desired_bin_size):
    min_val = np.min(data)
    max_val = np.max(data)
    min_boundary = -1.0 * (min_val % desired_bin_size - min_val)
    max_boundary = max_val - max_val % desired_bin_size + desired_bin_size
    n_bins = int((max_boundary - min_boundary) / desired_bin_size) + 1
    bins = np.linspace(min_boundary, max_boundary, n_bins)
    return bins

if __name__ == '__main__':
    data = np.random.random_sample(100) * 123.34 - 67.23
    bins = compute_histogram_bins(data, 10.0)
    print(bins)
    plt.hist(data, bins=bins)
    plt.xlabel('Value')
    plt.ylabel('Counts')
    plt.title('Compute Bins Example')
    plt.grid(True)
    plt.show()

The result has bins on nice intervals of bin size.

[-70. -60. -50. -40. -30. -20. -10.   0.  10.  20.  30.  40.  50.  60.]

computed bins histogram

柳絮泡泡 2024-12-05 22:20:24

如果您还关注可视化方面,则可以添加 edgecolor='white', linewidth=2 并将分箱分开:

date_binned = new_df[(new_df['k']>0)&(new_df['k']<360)]['k']
plt.hist(date_binned, bins=range(min(date_binned), max(date_binned) + binwidth, binwidth), edgecolor='white', linewidth=2)

在此处输入图像描述

If you are looking on the visualization aspect also, you can add edgecolor='white', linewidth=2 and will have the binned separated :

date_binned = new_df[(new_df['k']>0)&(new_df['k']<360)]['k']
plt.hist(date_binned, bins=range(min(date_binned), max(date_binned) + binwidth, binwidth), edgecolor='white', linewidth=2)

enter image description here

春花秋月 2024-12-05 22:20:24

这个答案支持@macrocosme的建议。

我使用热图作为 hist2d 图。另外,我使用 cmin=0.5 表示无计数值,使用 cmap 表示颜色,r 表示给定颜色的反转。

描述统计数据。
输入图片此处描述

# np.arange(data.min(), data.max()+binwidth, binwidth)
bin_x = np.arange(0.6, 7 + 0.3, 0.3)
bin_y = np.arange(12, 58 + 3, 3)
plt.hist2d(data=fuel_econ, x='displ', y='comb', cmin=0.5, cmap='viridis_r', bins=[bin_x, bin_y]);
plt.xlabel('Dispalcement (1)');
plt.ylabel('Combine fuel efficiency (mpg)');

plt.colorbar();

在此处输入图像描述

This answer support the @ macrocosme suggestion.

I am using heat map as hist2d plot. Additionally I use cmin=0.5 for no count value and cmap for color, r represent the reverse of given color.

Describe statistics.
enter image description here

# np.arange(data.min(), data.max()+binwidth, binwidth)
bin_x = np.arange(0.6, 7 + 0.3, 0.3)
bin_y = np.arange(12, 58 + 3, 3)
plt.hist2d(data=fuel_econ, x='displ', y='comb', cmin=0.5, cmap='viridis_r', bins=[bin_x, bin_y]);
plt.xlabel('Dispalcement (1)');
plt.ylabel('Combine fuel efficiency (mpg)');

plt.colorbar();

enter image description here

叹沉浮 2024-12-05 22:20:24

对于具有整数 x 值的直方图,我最终使用

plt.hist(data, np.arange(min(data)-0.5, max(data)+0.5))
plt.xticks(range(min(data), max(data)))

0.5 的偏移量将 bin 置于 x 轴值的中心。 plt.xticks 调用为每个整数添加一个刻度。

For a histogram with integer x-values I ended up using

plt.hist(data, np.arange(min(data)-0.5, max(data)+0.5))
plt.xticks(range(min(data), max(data)))

The offset of 0.5 centers the bins on the x-axis values. The plt.xticks call adds a tick for every integer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文