Matplotlib 直方图与对数拉普拉斯 PDF
(在深入阅读源代码之前,请务必检查帖子末尾的编辑)
我正在绘制似乎为 对数拉普拉斯分布:
我试图画一条最适合它的线来验证我的假设,但我难以获得有意义的结果。
我使用 Wikipedia 中的拉普拉斯 PDF 定义,并取 10 的 PDF 次方 ( “逆转”对数直方图的影响)。
我做错了什么?
这是我的代码。我通过标准输入(cat pop.txt | python hist.py
)进行管道传输 - 这里是< /a> 样本群体。
from pylab import *
import numpy
def laplace(x, mu, b):
return 10**(1.0/(2*b) * numpy.exp(-abs(x - mu)/b))
def main():
import sys
num = map(int, sys.stdin.read().strip().split(' '))
nbins = max(num) - min(num)
n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=True, align='left')
loc, scale = 0., 1.
x = numpy.arange(bins[0], bins[-1], 1.)
pdf = laplace(x, 0., 1.)
plot(x, pdf)
width = max(-min(num), max(num))
xlim((-width, width))
ylim((1.0, 10**7))
show()
if __name__ == '__main__':
main()
编辑
好的,这里尝试将其与常规拉普拉斯分布(而不是对数拉普拉斯分布)进行匹配。与上述尝试的区别:
- 直方图是标准化的,
- 直方图是线性的(不是对数)
laplace
函数的定义完全按照维基百科文章中的指定
输出:
如您所见,它不是最佳匹配,而是数字(直方图和拉普拉斯图) PDF)至少现在处于同一水平。我认为拉普拉斯对数会匹配得更好。我的方法(上面的来源)不起作用。有人能建议一种方法来做到这一点吗?
来源:
from pylab import *
import numpy
def laplace(x, mu, b):
return 1.0/(2*b) * numpy.exp(-abs(x - mu)/b)
def main():
import sys
num = map(int, sys.stdin.read().strip().split(' '))
nbins = max(num) - min(num)
n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=False, align='left', normed=True)
loc, scale = 0., 0.54
x = numpy.arange(bins[0], bins[-1], 1.)
pdf = laplace(x, loc, scale)
plot(x, pdf)
width = max(-min(num), max(num))
xlim((-width, width))
show()
if __name__ == '__main__':
main()
(be sure to check out the EDIT at the end of the post before reading too deeply into the source)
I'm plotting a histogram of a population that seems to be of log Laplacian distribution:
I'm trying to draw a line of best fit for it to verify my hypothesis, but I'm having problems getting meaningful results.
I'm using the Laplacian PDF definition from Wikipedia and taking 10 to the power of the PDF (to "reverse" the effects of the log histogram).
What am I doing wrong?
Here is my code. I pipe things through standard input (cat pop.txt | python hist.py
) -- here's a sample population.
from pylab import *
import numpy
def laplace(x, mu, b):
return 10**(1.0/(2*b) * numpy.exp(-abs(x - mu)/b))
def main():
import sys
num = map(int, sys.stdin.read().strip().split(' '))
nbins = max(num) - min(num)
n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=True, align='left')
loc, scale = 0., 1.
x = numpy.arange(bins[0], bins[-1], 1.)
pdf = laplace(x, 0., 1.)
plot(x, pdf)
width = max(-min(num), max(num))
xlim((-width, width))
ylim((1.0, 10**7))
show()
if __name__ == '__main__':
main()
EDIT
OK, here is the attempt to match it to a regular Laplacian distribution (as opposed to a log Laplacian). Differences from above attempt:
- histogram is normed
- histogram is linear (not log)
laplace
function defined exactly as specified in the Wikipedia article
Output:
As you can see, it isn't the best match, but the figures (the histogram and the Laplace PDF) are at least now in the same ballpark. I think the log Laplace will match a lot better. My approach (source above) didn't work. Can anybody suggest a way to do this what works?
Source:
from pylab import *
import numpy
def laplace(x, mu, b):
return 1.0/(2*b) * numpy.exp(-abs(x - mu)/b)
def main():
import sys
num = map(int, sys.stdin.read().strip().split(' '))
nbins = max(num) - min(num)
n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=False, align='left', normed=True)
loc, scale = 0., 0.54
x = numpy.arange(bins[0], bins[-1], 1.)
pdf = laplace(x, loc, scale)
plot(x, pdf)
width = max(-min(num), max(num))
xlim((-width, width))
show()
if __name__ == '__main__':
main()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的 laplace() 函数似乎不是拉普拉斯分布。此外,numpy.log() 是自然对数(以 e 为底),而不是十进制。
您的直方图似乎没有标准化,而分布却是标准化的。
编辑:
不要使用一揽子导入
from pyplot import *
,它会咬你。如果您要检查拉普拉斯分布(或其对数)的一致性,请利用后者围绕
mu
对称的事实:将mu
修正为您的最大值直方图,并且您有一个单参数问题。而且您也只能使用直方图的任意一半。使用
numpy
的直方图函数——这样您就可以获得直方图本身,然后您可以将其与拉普拉斯分布(和/或其对数)拟合。卡方将告诉您一致性的好(或多差)。为了拟合,您可以使用例如scipy.optimize.leastsq
例程 (http://www.scipy.org/Cookbook/FittingData)Your laplace() function does not seem to be a Laplace distribution. Besides,
numpy.log()
is a natural logarithm (basee
), not decimal.Your histogram does not seem to be normalized, while the distribution is.
EDIT:
Don't use blanket imports
from pyplot import *
, it'll bite you.If you're checking consistency with Laplace distribution (or its log), use the fact that the latter is symmetric around
mu
: fixmu
at a maximum of your histogram, and you have a single-parameter problem. And you can only use either half of the histogram as well.Use
numpy
's histogram function -- this way you can get the histogram itself, which you can then fit with a Laplace distribution (and/or its log). The chi-square will tell you how good (or bad) the consistency is. For fitting you can use, e.g.scipy.optimize.leastsq
routine (http://www.scipy.org/Cookbook/FittingData)我找到了解决我遇到的问题的方法。我没有使用
matplotlib.hist
,而是结合使用numpy.histogram
和matplotlib.bar
来计算直方图并分两个单独的步骤绘制它。我不确定是否有办法使用 matplotlib.hist 来做到这一点——不过,它肯定会更方便。
您可以看到这是一个更好的匹配。
我现在的问题是我需要估计 PDF 的
scale
参数。来源:
I've found a work-around for the problem I was having. Instead of using
matplotlib.hist
, I usenumpy.histogram
in combination withmatplotlib.bar
to calculate the histogram and draw it in two separate steps.I'm not sure if there's a way to do this using
matplotlib.hist
-- it would definitely be more convenient, though.You can see that it's a much better match.
My problem now is I need to estimate the
scale
parameter of the PDF.Source: