直方图条高度的总和添加到1-重复

发布于 2025-02-13 17:41:45 字数 1401 浏览 2 评论 0原文

我尝试查看其他用户的问题，但我认为我没有找到答案。

我正在尝试从我存储在熊猫数据框中的一些数据中绘制直方图，我希望每个垃圾箱的y轴值等于发生该箱事件的概率。由于密度= true matplotlib.pyplot.hist将bin中的计数除以总数和的bin尺寸尺寸=/= 1，直方图的y轴值不等于该箱中发生的事件的概率。相反，它等于该垃圾箱中每个单位的bin的概率。我希望将我的垃圾箱宽10宽，这导致了我的问题。

我的代码生成带有与我正在使用的数据相似的数据框架的代码：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from random import seed
from random import randint

data = pd.DataFrame(columns=['Col1'])

i = 0
while i < 49500:
    data.loc[len(data.index)] = [0]
    i += 1

seed(1)
j = 0
while j < 500:
    data.loc[len(data.index)] = [randint(1,500)]
    j += 1

绘制直方图的代码：

plt.figure(2)
fig2, ax2 = plt.subplots()
ax2.hist(data['Col1'], range=(0.0, 500.0), bins=50, label='50000 numbers\n in 10 unit bins', density=True)
plt.title('Probability Density of Some Numbers from 0 to 500', wrap=True)
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks()
plt.minorticks_on()
plt.ylabel('Probability')
plt.xlabel('Number')
plt.savefig('randnum.png')

直方图（请注意0-10 bin，同时构成大约99％的数据，仅是有可能的0.1）：

我确实意识到，通过使y轴概率与bin大小成反比，直方图不再等于1（在我的情况下等于10），但这正是我正在寻找的。

有没有办法可以使直方图将直方图的值归一化为或2）直接将直方图的y值乘以我选择的值？

原文

I tried looking this up on other users' questions, but I don't think I have found an answer.

I am attempting to plot a histogram from some data I have stored in a Pandas dataframe, and I want the y-axis value of each bin to equal the probability of that bin's event occurring. Since the density=True argument of matplotlib.pyplot.hist divides the counts in a bin by total counts and by the bin size, for bins of size =/= 1, the y-axis value of the histogram doesn't equal the probability of the event happening in that bin. It instead equals the probability in that bin per unit in that bin. I wish to make my bins 10 units wide, which has lead to my issue.

My code to generate a Pandas dataframe with data similar to what I'm working with:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from random import seed
from random import randint

data = pd.DataFrame(columns=['Col1'])

i = 0
while i < 49500:
    data.loc[len(data.index)] = [0]
    i += 1

seed(1)
j = 0
while j < 500:
    data.loc[len(data.index)] = [randint(1,500)]
    j += 1

My code to plot the histogram:

plt.figure(2)
fig2, ax2 = plt.subplots()
ax2.hist(data['Col1'], range=(0.0, 500.0), bins=50, label='50000 numbers\n in 10 unit bins', density=True)
plt.title('Probability Density of Some Numbers from 0 to 500', wrap=True)
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks()
plt.minorticks_on()
plt.ylabel('Probability')
plt.xlabel('Number')
plt.savefig('randnum.png')

My histogram (note the 0-10 bin, while composing roughly 99% of the data, is only at a probability of 0.1):

I do realize that by making the y-axis probability not inversely proportional to bin size, the integral of the histogram no longer equals to 1 (it will equal to 10 in my case), but this is precisely what I am seeking.

Is there a way to either 1) change the value the histogram is normalized to or 2) directly multiply y-values of the histogram by a value of my choosing?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沒落の蓅哖 2025-02-20 17:41:45

我能够在 @johanc提到Seaborn的帮助下在Pyplot中完成此操作。我正在寻找的术语是“概率质量”（直方图条高度总和至1）。使用[此答案] [2]，我能够正确绘制我的直方图。以下是我的代码和新的直方图：

plt.figure(2)
fig2, ax2 = plt.subplots()
weights = np.ones_like(data['Col1']) / len(data['Col1'])
ax2.hist(data['Col1'], range=(0.0, 500.0), weights=weights, bins=50, label='50000 numbers\n in 10 unit bins')
plt.title('Probability Density of Some Numbers from 0 to 500', wrap=True)
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks()
plt.minorticks_on()
plt.ylabel('Probability')
plt.xlabel('Number')
plt.savefig('randnum.png')

I was able to accomplish this in pyplot with help from @JohanC's reference to Seaborn. The terminology I was looking for is 'probability mass' (the histogram bar heights sum to 1). Using [this answer][2], I was able to properly plot my histogram. Below is my code and my new histogram:

plt.figure(2)
fig2, ax2 = plt.subplots()
weights = np.ones_like(data['Col1']) / len(data['Col1'])
ax2.hist(data['Col1'], range=(0.0, 500.0), weights=weights, bins=50, label='50000 numbers\n in 10 unit bins')
plt.title('Probability Density of Some Numbers from 0 to 500', wrap=True)
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks()
plt.minorticks_on()
plt.ylabel('Probability')
plt.xlabel('Number')
plt.savefig('randnum.png')