直方图条高度的总和添加到1-重复

发布于 2025-02-13 17:41:45 字数 1401 浏览 2 评论 0原文

我尝试查看其他用户的问题,但我认为我没有找到答案。

我正在尝试从我存储在熊猫数据框中的一些数据中绘制直方图,我希望每个垃圾箱的y轴值等于发生该箱事件的概率。由于密度= true matplotlib.pyplot.hist将bin中的计数除以总数的bin尺寸尺寸=/= 1,直方图的y轴值不等于该箱中发生的事件的概率。相反,它等于该垃圾箱中每个单位的bin的概率。我希望将我的垃圾箱宽10宽,这导致了我的问题。

我的代码生成带有与我正在使用的数据相似的数据框架的代码:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from random import seed
from random import randint

data = pd.DataFrame(columns=['Col1'])

i = 0
while i < 49500:
    data.loc[len(data.index)] = [0]
    i += 1

seed(1)
j = 0
while j < 500:
    data.loc[len(data.index)] = [randint(1,500)]
    j += 1

绘制直方图的代码:

plt.figure(2)
fig2, ax2 = plt.subplots()
ax2.hist(data['Col1'], range=(0.0, 500.0), bins=50, label='50000 numbers\n in 10 unit bins', density=True)
plt.title('Probability Density of Some Numbers from 0 to 500', wrap=True)
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks()
plt.minorticks_on()
plt.ylabel('Probability')
plt.xlabel('Number')
plt.savefig('randnum.png')

直方图(请注意0-10 bin,同时构成大约99%的数据,仅是有可能的0.1):

“

我确实意识到,通过使y轴概率与bin大小成反比,直方图不再等于1(在我的情况下等于10),但这正是我正在寻找的。

有没有办法可以使直方图将直方图的值归一化为或2)直接将直方图的y值乘以我选择的值?

I tried looking this up on other users' questions, but I don't think I have found an answer.

I am attempting to plot a histogram from some data I have stored in a Pandas dataframe, and I want the y-axis value of each bin to equal the probability of that bin's event occurring. Since the density=True argument of matplotlib.pyplot.hist divides the counts in a bin by total counts and by the bin size, for bins of size =/= 1, the y-axis value of the histogram doesn't equal the probability of the event happening in that bin. It instead equals the probability in that bin per unit in that bin. I wish to make my bins 10 units wide, which has lead to my issue.

My code to generate a Pandas dataframe with data similar to what I'm working with:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from random import seed
from random import randint

data = pd.DataFrame(columns=['Col1'])

i = 0
while i < 49500:
    data.loc[len(data.index)] = [0]
    i += 1

seed(1)
j = 0
while j < 500:
    data.loc[len(data.index)] = [randint(1,500)]
    j += 1

My code to plot the histogram:

plt.figure(2)
fig2, ax2 = plt.subplots()
ax2.hist(data['Col1'], range=(0.0, 500.0), bins=50, label='50000 numbers\n in 10 unit bins', density=True)
plt.title('Probability Density of Some Numbers from 0 to 500', wrap=True)
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks()
plt.minorticks_on()
plt.ylabel('Probability')
plt.xlabel('Number')
plt.savefig('randnum.png')

My histogram (note the 0-10 bin, while composing roughly 99% of the data, is only at a probability of 0.1):

Histogram with plt

I do realize that by making the y-axis probability not inversely proportional to bin size, the integral of the histogram no longer equals to 1 (it will equal to 10 in my case), but this is precisely what I am seeking.

Is there a way to either 1) change the value the histogram is normalized to or 2) directly multiply y-values of the histogram by a value of my choosing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

沒落の蓅哖 2025-02-20 17:41:45

我能够在 @johanc提到Seaborn的帮助下在Pyplot中完成此操作。我正在寻找的术语是“概率质量”(直方图条高度总和至1)。使用[此答案] [2],我能够正确绘制我的直方图。以下是我的代码和新的直方图:

plt.figure(2)
fig2, ax2 = plt.subplots()
weights = np.ones_like(data['Col1']) / len(data['Col1'])
ax2.hist(data['Col1'], range=(0.0, 500.0), weights=weights, bins=50, label='50000 numbers\n in 10 unit bins')
plt.title('Probability Density of Some Numbers from 0 to 500', wrap=True)
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks()
plt.minorticks_on()
plt.ylabel('Probability')
plt.xlabel('Number')
plt.savefig('randnum.png')

”在此处输入图像描述”

I was able to accomplish this in pyplot with help from @JohanC's reference to Seaborn. The terminology I was looking for is 'probability mass' (the histogram bar heights sum to 1). Using [this answer][2], I was able to properly plot my histogram. Below is my code and my new histogram:

plt.figure(2)
fig2, ax2 = plt.subplots()
weights = np.ones_like(data['Col1']) / len(data['Col1'])
ax2.hist(data['Col1'], range=(0.0, 500.0), weights=weights, bins=50, label='50000 numbers\n in 10 unit bins')
plt.title('Probability Density of Some Numbers from 0 to 500', wrap=True)
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks()
plt.minorticks_on()
plt.ylabel('Probability')
plt.xlabel('Number')
plt.savefig('randnum.png')

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文