当前位置：文江博客话题详情

Matplotlib 中的 bin 大小（直方图）

发布于 2024-11-28 22:20:24 字数 64 浏览 1 评论 0原文

我正在使用 matplotlib 制作直方图。

有没有办法手动设置垃圾箱的大小而不是垃圾箱的数量？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

安静被遗忘 2024-12-05 22:20:24

实际上，这很简单：您可以提供带有垃圾箱边界的列表，而不是垃圾箱的数量。它们也可以不均匀分布：

plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])

如果您只是希望它们均匀分布，您可以简单地使用范围：

plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))

添加到原始答案

上面的行适用于仅填充整数的数据 。正如 macrocosme 指出的，对于浮点数，您可以使用：

import numpy as np
plt.hist(data, bins=np.arange(min(data), max(data) + binwidth, binwidth))

Actually, it's quite easy: instead of the number of bins you can give a list with the bin boundaries. They can be unequally distributed, too:

plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])

If you just want them equally distributed, you can simply use range:

plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))

Added to original answer

The above line works for data filled with integers only. As macrocosme points out, for floats you can use:

import numpy as np
plt.hist(data, bins=np.arange(min(data), max(data) + binwidth, binwidth))

回复收藏 0 原文

葮薆情 2024-12-05 22:20:24

对于 N 个箱，箱边缘由 N+1 个值的列表指定，其中前 N 给出箱的下边缘，+1 给出最后一个箱的上边缘。

代码：

from numpy import np; from pylab import *

bin_size = 0.1; min_edge = 0; max_edge = 2.5
N = (max_edge-min_edge)/bin_size; Nplus1 = N + 1
bin_list = np.linspace(min_edge, max_edge, Nplus1)

请注意，linspace 生成从 min_edge 到 max_edge 的数组，分为 N+1 个值或 N 个 bin

For N bins, the bin edges are specified by list of N+1 values where the first N give the lower bin edges and the +1 gives the upper edge of the last bin.

Code:

from numpy import np; from pylab import *

bin_size = 0.1; min_edge = 0; max_edge = 2.5
N = (max_edge-min_edge)/bin_size; Nplus1 = N + 1
bin_list = np.linspace(min_edge, max_edge, Nplus1)

Note that linspace produces array from min_edge to max_edge broken into N+1 values or N bins

回复收藏 0 原文

待天淡蓝洁白时 2024-12-05 22:20:24

我使用分位数来统一垃圾箱并适合样本：

bins=df['Generosity'].quantile([0,.05,0.1,0.15,0.20,0.25,0.3,0.35,0.40,0.45,0.5,0.55,0.6,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1]).to_list()

plt.hist(df['Generosity'], bins=bins, normed=True, alpha=0.5, histtype='stepfilled', color='steelblue', edgecolor='none')

I use quantiles to do bins uniform and fitted to sample:

bins=df['Generosity'].quantile([0,.05,0.1,0.15,0.20,0.25,0.3,0.35,0.40,0.45,0.5,0.55,0.6,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1]).to_list()

plt.hist(df['Generosity'], bins=bins, normed=True, alpha=0.5, histtype='stepfilled', color='steelblue', edgecolor='none')

回复收藏 0 原文

寄离 2024-12-05 22:20:24

我想最简单的方法是计算您拥有的数据的最小值和最大值，然后计算L = max - min。然后，将 L 除以所需的 bin 宽度（我假设这就是您所说的 bin 大小），并使用该值的上限作为 bin 的数量。

回复收藏 0 原文

白芷 2024-12-05 22:20:24

我和OP有同样的问题（我想！），但我无法让它按照Lastalda指定的方式工作。我不知道我是否正确解释了这个问题，但我找到了另一种解决方案（尽管这可能是一种非常糟糕的方法）。

这就是我这样做的方式：

plt.hist([1,11,21,31,41], bins=[0,10,20,30,40,50],weights=[10,1 ,40,33,6]);

创建此：

所以第一个参数基本上“初始化”了 bin - 我专门创建了一个介于 bins 参数中设置的范围之间的数字。

为了演示这一点，请查看第一个参数中的数组 ([1,11,21,31,41]) 和第二个参数中的“bins”数组 ([0,10,20,30,40,50]) ：

数字 1（来自第一个数组）介于 0 和 10 之间（在“bins”数组中）
数字 11（来自第一个数组）介于 11 和 20 之间（在“bins”数组中） 'bins' 数组）
数字 21（来自第一个数组）介于 21 和 30（在 'bins' 数组中）之间，依此类推。

然后我使用 'weights' 参数来定义每个 bin 的大小。这是用于权重参数的数组：[10,1,40,33,6]。

因此，0 到 10 bin 的值为 10，11 到 20 bin 的值为 1，21 到 30 bin 的值为 40，等等。

回复收藏 0 原文

指尖微凉心微凉 2024-12-05 22:20:24

我喜欢事情自动发生，并且让垃圾箱落在“好的”值上。以下似乎工作得很好。

import numpy as np
import numpy.random as random
import matplotlib.pyplot as plt
def compute_histogram_bins(data, desired_bin_size):
    min_val = np.min(data)
    max_val = np.max(data)
    min_boundary = -1.0 * (min_val % desired_bin_size - min_val)
    max_boundary = max_val - max_val % desired_bin_size + desired_bin_size
    n_bins = int((max_boundary - min_boundary) / desired_bin_size) + 1
    bins = np.linspace(min_boundary, max_boundary, n_bins)
    return bins

if __name__ == '__main__':
    data = np.random.random_sample(100) * 123.34 - 67.23
    bins = compute_histogram_bins(data, 10.0)
    print(bins)
    plt.hist(data, bins=bins)
    plt.xlabel('Value')
    plt.ylabel('Counts')
    plt.title('Compute Bins Example')
    plt.grid(True)
    plt.show()

结果的箱大小间隔很好。

[-70. -60. -50. -40. -30. -20. -10.   0.  10.  20.  30.  40.  50.  60.]

计算的 bins 直方图

I like things to happen automatically and for bins to fall on "nice" values. The following seems to work quite well.

import numpy as np
import numpy.random as random
import matplotlib.pyplot as plt
def compute_histogram_bins(data, desired_bin_size):
    min_val = np.min(data)
    max_val = np.max(data)
    min_boundary = -1.0 * (min_val % desired_bin_size - min_val)
    max_boundary = max_val - max_val % desired_bin_size + desired_bin_size
    n_bins = int((max_boundary - min_boundary) / desired_bin_size) + 1
    bins = np.linspace(min_boundary, max_boundary, n_bins)
    return bins

if __name__ == '__main__':
    data = np.random.random_sample(100) * 123.34 - 67.23
    bins = compute_histogram_bins(data, 10.0)
    print(bins)
    plt.hist(data, bins=bins)
    plt.xlabel('Value')
    plt.ylabel('Counts')
    plt.title('Compute Bins Example')
    plt.grid(True)
    plt.show()

The result has bins on nice intervals of bin size.

[-70. -60. -50. -40. -30. -20. -10.   0.  10.  20.  30.  40.  50.  60.]

computed bins histogram

回复收藏 0 原文

柳絮泡泡 2024-12-05 22:20:24

如果您还关注可视化方面，则可以添加 edgecolor='white', linewidth=2 并将分箱分开：

date_binned = new_df[(new_df['k']>0)&(new_df['k']<360)]['k']
plt.hist(date_binned, bins=range(min(date_binned), max(date_binned) + binwidth, binwidth), edgecolor='white', linewidth=2)

If you are looking on the visualization aspect also, you can add edgecolor='white', linewidth=2 and will have the binned separated :

date_binned = new_df[(new_df['k']>0)&(new_df['k']<360)]['k']
plt.hist(date_binned, bins=range(min(date_binned), max(date_binned) + binwidth, binwidth), edgecolor='white', linewidth=2)

回复收藏 0 原文

春花秋月 2024-12-05 22:20:24

这个答案支持@macrocosme的建议。

我使用热图作为 hist2d 图。另外，我使用 cmin=0.5 表示无计数值，使用 cmap 表示颜色，r 表示给定颜色的反转。

描述统计数据。

# np.arange(data.min(), data.max()+binwidth, binwidth)
bin_x = np.arange(0.6, 7 + 0.3, 0.3)
bin_y = np.arange(12, 58 + 3, 3)
plt.hist2d(data=fuel_econ, x='displ', y='comb', cmin=0.5, cmap='viridis_r', bins=[bin_x, bin_y]);
plt.xlabel('Dispalcement (1)');
plt.ylabel('Combine fuel efficiency (mpg)');

plt.colorbar();

This answer support the @ macrocosme suggestion.

I am using heat map as hist2d plot. Additionally I use cmin=0.5 for no count value and cmap for color, r represent the reverse of given color.

Describe statistics.

# np.arange(data.min(), data.max()+binwidth, binwidth)
bin_x = np.arange(0.6, 7 + 0.3, 0.3)
bin_y = np.arange(12, 58 + 3, 3)
plt.hist2d(data=fuel_econ, x='displ', y='comb', cmin=0.5, cmap='viridis_r', bins=[bin_x, bin_y]);
plt.xlabel('Dispalcement (1)');
plt.ylabel('Combine fuel efficiency (mpg)');

plt.colorbar();

回复收藏 0 原文

叹沉浮 2024-12-05 22:20:24

对于具有整数 x 值的直方图，我最终使用

plt.hist(data, np.arange(min(data)-0.5, max(data)+0.5))
plt.xticks(range(min(data), max(data)))

0.5 的偏移量将 bin 置于 x 轴值的中心。 plt.xticks 调用为每个整数添加一个刻度。

For a histogram with integer x-values I ended up using

plt.hist(data, np.arange(min(data)-0.5, max(data)+0.5))
plt.xticks(range(min(data), max(data)))

The offset of 0.5 centers the bins on the x-axis values. The plt.xticks call adds a tick for every integer.

回复收藏 0 原文

~没有更多了~

关于作者

花落人断肠

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

Matplotlib 中的 bin 大小（直方图）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

Matplotlib 中的 bin 大小（直方图）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。