直方图条高度的总和添加到1-重复
我尝试查看其他用户的问题,但我认为我没有找到答案。
我正在尝试从我存储在熊猫数据框中的一些数据中绘制直方图,我希望每个垃圾箱的y轴值等于发生该箱事件的概率。由于密度= true
matplotlib.pyplot.hist
将bin中的计数除以总数和的bin尺寸尺寸=/= 1,直方图的y轴值不等于该箱中发生的事件的概率。相反,它等于该垃圾箱中每个单位的bin的概率。我希望将我的垃圾箱宽10宽,这导致了我的问题。
我的代码生成带有与我正在使用的数据相似的数据框架的代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from random import seed
from random import randint
data = pd.DataFrame(columns=['Col1'])
i = 0
while i < 49500:
data.loc[len(data.index)] = [0]
i += 1
seed(1)
j = 0
while j < 500:
data.loc[len(data.index)] = [randint(1,500)]
j += 1
绘制直方图的代码:
plt.figure(2)
fig2, ax2 = plt.subplots()
ax2.hist(data['Col1'], range=(0.0, 500.0), bins=50, label='50000 numbers\n in 10 unit bins', density=True)
plt.title('Probability Density of Some Numbers from 0 to 500', wrap=True)
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks()
plt.minorticks_on()
plt.ylabel('Probability')
plt.xlabel('Number')
plt.savefig('randnum.png')
直方图(请注意0-10 bin,同时构成大约99%的数据,仅是有可能的0.1):
我确实意识到,通过使y轴概率与bin大小成反比,直方图不再等于1(在我的情况下等于10),但这正是我正在寻找的。
有没有办法可以使直方图将直方图的值归一化为或2)直接将直方图的y值乘以我选择的值?
I tried looking this up on other users' questions, but I don't think I have found an answer.
I am attempting to plot a histogram from some data I have stored in a Pandas dataframe, and I want the y-axis value of each bin to equal the probability of that bin's event occurring. Since the density=True
argument of matplotlib.pyplot.hist
divides the counts in a bin by total counts and by the bin size, for bins of size =/= 1, the y-axis value of the histogram doesn't equal the probability of the event happening in that bin. It instead equals the probability in that bin per unit in that bin. I wish to make my bins 10 units wide, which has lead to my issue.
My code to generate a Pandas dataframe with data similar to what I'm working with:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from random import seed
from random import randint
data = pd.DataFrame(columns=['Col1'])
i = 0
while i < 49500:
data.loc[len(data.index)] = [0]
i += 1
seed(1)
j = 0
while j < 500:
data.loc[len(data.index)] = [randint(1,500)]
j += 1
My code to plot the histogram:
plt.figure(2)
fig2, ax2 = plt.subplots()
ax2.hist(data['Col1'], range=(0.0, 500.0), bins=50, label='50000 numbers\n in 10 unit bins', density=True)
plt.title('Probability Density of Some Numbers from 0 to 500', wrap=True)
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks()
plt.minorticks_on()
plt.ylabel('Probability')
plt.xlabel('Number')
plt.savefig('randnum.png')
My histogram (note the 0-10 bin, while composing roughly 99% of the data, is only at a probability of 0.1):
I do realize that by making the y-axis probability not inversely proportional to bin size, the integral of the histogram no longer equals to 1 (it will equal to 10 in my case), but this is precisely what I am seeking.
Is there a way to either 1) change the value the histogram is normalized to or 2) directly multiply y-values of the histogram by a value of my choosing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我能够在 @johanc提到Seaborn的帮助下在
Pyplot
中完成此操作。我正在寻找的术语是“概率质量”(直方图条高度总和至1)。使用[此答案] [2],我能够正确绘制我的直方图。以下是我的代码和新的直方图:I was able to accomplish this in
pyplot
with help from @JohanC's reference to Seaborn. The terminology I was looking for is 'probability mass' (the histogram bar heights sum to 1). Using [this answer][2], I was able to properly plot my histogram. Below is my code and my new histogram: