绘制直方图,使条形高度总和为 1(概率)

发布于 2024-09-25 21:18:21 字数 212 浏览 9 评论 0原文

我想使用 matplotlib 从向量绘制归一化直方图。我尝试了以下方法:

plt.hist(myarray, normed=True)

以及:

plt.hist(myarray, normed=1)

但两个选项都不会从 [0, 1] 生成 y 轴,以使直方图的条形高度总和为 1。

I'd like to plot a normalized histogram from a vector using matplotlib. I tried the following:

plt.hist(myarray, normed=True)

as well as:

plt.hist(myarray, normed=1)

but neither option produces a y-axis from [0, 1] such that the bar heights of the histogram sum to 1.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

夏至、离别 2024-10-02 21:18:21

如果您希望所有条形的总和等于单位,请按值的总数对每个 bin 进行加权:

weights = np.ones_like(myarray) / len(myarray)
plt.hist(myarray, weights=weights)

Python 2.x 的注意事项:将转换添加到 float() 的运算符之一除法,否则由于整数除法,您最终会得到零

If you want the sum of all bars to be equal unity, weight each bin by the total number of values:

weights = np.ones_like(myarray) / len(myarray)
plt.hist(myarray, weights=weights)

Note for Python 2.x: add casting to float() for one of the operators of the division as otherwise you would end up with zeros due to integer division

长伴 2024-10-02 21:18:21

如果您提出一个更完整的工作(或在本例中为非工作)示例,将会更有帮助。

我尝试了以下操作:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.randn(1000)

fig = plt.figure()
ax = fig.add_subplot(111)
n, bins, rectangles = ax.hist(x, 50, density=True)
fig.canvas.draw()
plt.show()

这确实会生成一个条形图直方图,其 y 轴来自 [0,1]

此外,根据 hist 文档(即 ipython 中的 ax.hist?),我认为总和也很好:

*normed*:
If *True*, the first element of the return tuple will
be the counts normalized to form a probability density, i.e.,
``n/(len(x)*dbin)``.  In a probability density, the integral of
the histogram should be 1; you can verify that with a
trapezoidal integration of the probability density function::

    pdf, bins, patches = ax.hist(...)
    print np.sum(pdf * np.diff(bins))

尝试一下在执行上述命令之后:

np.sum(n * np.diff(bins))

我得到了预期的 1.0 返回值。请记住,normed=True 并不意味着每个条形上的值的总和是统一的,而是表示条形上的积分是统一的。就我而言,np.sum(n) 返回大约 7.2767

It would be more helpful if you posed a more complete working (or in this case non-working) example.

I tried the following:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.randn(1000)

fig = plt.figure()
ax = fig.add_subplot(111)
n, bins, rectangles = ax.hist(x, 50, density=True)
fig.canvas.draw()
plt.show()

This will indeed produce a bar-chart histogram with a y-axis that goes from [0,1].

Further, as per the hist documentation (i.e. ax.hist? from ipython), I think the sum is fine too:

*normed*:
If *True*, the first element of the return tuple will
be the counts normalized to form a probability density, i.e.,
``n/(len(x)*dbin)``.  In a probability density, the integral of
the histogram should be 1; you can verify that with a
trapezoidal integration of the probability density function::

    pdf, bins, patches = ax.hist(...)
    print np.sum(pdf * np.diff(bins))

Giving this a try after the commands above:

np.sum(n * np.diff(bins))

I get a return value of 1.0 as expected. Remember that normed=True doesn't mean that the sum of the value at each bar will be unity, but rather than the integral over the bars is unity. In my case np.sum(n) returned approx 7.2767.

落花浅忆 2024-10-02 21:18:21

我知道这个答案已经太晚了,因为这个问题是 2010 年提出的,但我遇到这个问题是因为我自己也面临着类似的问题。正如答案中已经指出的,normed=True 意味着直方图下的总面积等于 1,但高度之和不等于 1。但是,为了方便直方图的物理解释,我想制作一个高度总和等于 1。

我在以下问题中找到了提示 - Python:面积标准化为 1 以外的值的直方图

但我无法找到一种方法使条形图模仿 histt​​ype="step" 功能 hist()。这将我转移到: Matplotlib - 已分箱数据的步进直方图

如果社区认为这是可以接受的,我想提出一个综合上述两篇文章的想法的解决方案。

import matplotlib.pyplot as plt

# Let X be the array whose histogram needs to be plotted.
nx, xbins, ptchs = plt.hist(X, bins=20)
plt.clf() # Get rid of this histogram since not the one we want.

nx_frac = nx/float(len(nx)) # Each bin divided by total number of objects.
width = xbins[1] - xbins[0] # Width of each bin.
x = np.ravel(zip(xbins[:-1], xbins[:-1]+width))
y = np.ravel(zip(nx_frac,nx_frac))

plt.plot(x,y,linestyle="dashed",label="MyLabel")
#... Further formatting.

这对我来说非常有效,尽管在某些情况下我注意到直方图最左边的“条”或最右边的“条”不会通过触摸 Y 轴的最低点而关闭。在这种情况下,在 y 的开头或末尾添加元素 0 可以达到必要的结果。

只是想我会分享我的经验。谢谢。

I know this answer is too late considering the question is dated 2010 but I came across this question as I was facing a similar problem myself. As already stated in the answer, normed=True means that the total area under the histogram is equal to 1 but the sum of heights is not equal to 1. However, I wanted to, for convenience of physical interpretation of a histogram, make one with sum of heights equal to 1.

I found a hint in the following question - Python: Histogram with area normalized to something other than 1

But I was not able to find a way of making bars mimic the histtype="step" feature hist(). This diverted me to : Matplotlib - Stepped histogram with already binned data

If the community finds it acceptable I should like to put forth a solution which synthesises ideas from both the above posts.

import matplotlib.pyplot as plt

# Let X be the array whose histogram needs to be plotted.
nx, xbins, ptchs = plt.hist(X, bins=20)
plt.clf() # Get rid of this histogram since not the one we want.

nx_frac = nx/float(len(nx)) # Each bin divided by total number of objects.
width = xbins[1] - xbins[0] # Width of each bin.
x = np.ravel(zip(xbins[:-1], xbins[:-1]+width))
y = np.ravel(zip(nx_frac,nx_frac))

plt.plot(x,y,linestyle="dashed",label="MyLabel")
#... Further formatting.

This has worked wonderfully for me though in some cases I have noticed that the left most "bar" or the right most "bar" of the histogram does not close down by touching the lowest point of the Y-axis. In such a case adding an element 0 at the begging or the end of y achieved the necessary result.

Just thought I'd share my experience. Thank you.

以歌曲疗慰 2024-10-02 21:18:21

这是使用np.histogram()方法的另一种简单解决方案。

myarray = np.random.random(100)
results, edges = np.histogram(myarray, normed=True)
binWidth = edges[1] - edges[0]
plt.bar(edges[:-1], results*binWidth, binWidth)

您确实可以检查总计最多1个以下总和:

> print sum(results*binWidth)
1.0

Here is another simple solution using np.histogram() method.

myarray = np.random.random(100)
results, edges = np.histogram(myarray, normed=True)
binWidth = edges[1] - edges[0]
plt.bar(edges[:-1], results*binWidth, binWidth)

You can indeed check that the total sums up to 1 with:

> print sum(results*binWidth)
1.0
野却迷人 2024-10-02 21:18:21

导入和数据

import seaborn as sns
import matplotlib.pyplot as plt

# load data
df = sns.load_dataset('penguins')

sns.histplot

# create figure and axes
fig, ax = plt.subplots(figsize=(6, 5))

p = sns.histplot(data=df, x='flipper_length_mm', stat='probability', ax=ax)

在此处输入图像描述

sns.displot

p = sns.displot(data=df, x='flipper_length_mm', stat='probability', height=4, aspect=1.5)

在此处输入图像描述

Imports and Data

import seaborn as sns
import matplotlib.pyplot as plt

# load data
df = sns.load_dataset('penguins')

sns.histplot

# create figure and axes
fig, ax = plt.subplots(figsize=(6, 5))

p = sns.histplot(data=df, x='flipper_length_mm', stat='probability', ax=ax)

enter image description here

sns.displot

p = sns.displot(data=df, x='flipper_length_mm', stat='probability', height=4, aspect=1.5)

enter image description here

孤独岁月 2024-10-02 21:18:21

自 matplotlib 3.0.2 起,normed=True 已弃用。为了获得所需的输出,我必须这样做:

import numpy as np
data=np.random.randn(1000)
bins=np.arange(-3.0,3.0,51)
counts, _ = np.histogram(data,bins=bins)
if density: # equivalent of normed=True
    counts_weighter=counts.sum()
else: # equivalent of normed=False
    counts_weighter=1.0
plt.hist(bins[:-1],bins=bins,weights=counts/counts_weighter)

尝试同时指定 权重密度 作为 plt.hist() 的参数对我。如果有人知道在无法访问规范关键字参数的情况下实现该功能的方法,请在评论中告诉我,我将删除/修改此答案。

如果你想要 bin 中心,那么不要使用 bins[:-1],它是 bin 边缘 - 你需要选择一个合适的方案来计算中心(这可能是也可能不是微不足道的)衍生的)。

Since matplotlib 3.0.2, normed=True is deprecated. To get the desired output I had to do:

import numpy as np
data=np.random.randn(1000)
bins=np.arange(-3.0,3.0,51)
counts, _ = np.histogram(data,bins=bins)
if density: # equivalent of normed=True
    counts_weighter=counts.sum()
else: # equivalent of normed=False
    counts_weighter=1.0
plt.hist(bins[:-1],bins=bins,weights=counts/counts_weighter)

Trying to specify weights and density simultaneously as arguments to plt.hist() did not work for me. If anyone know of a way to get that working without having access to the normed keyword argument then please let me know in the comments and I will delete/modify this answer.

If you want bin centres then don't use bins[:-1] which are the bin edges - you need to choose a suitable scheme for how to calculate the centres (which may or may not be trivially derived).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文