Matplotlib 直方图与对数拉普拉斯 PDF

发布于 2024-11-08 18:49:13 字数 2177 浏览 0 评论 0原文

(在深入阅读源代码之前,请务必检查帖子末尾的编辑)

我正在绘制似乎为 对数拉普拉斯分布在此处输入图像描述

我试图画一条最适合它的线来验证我的假设,但我难以获得有意义的结果。

我使用 Wikipedia 中的拉普拉斯 PDF 定义,并取 10 的 PDF 次方 ( “逆转”对数直方图的影响)。

我做错了什么?

这是我的代码。我通过标准输入(cat pop.txt | python hist.py)进行管道传输 - 这里是< /a> 样本群体。

from pylab import *
import numpy    
def laplace(x, mu, b):
    return 10**(1.0/(2*b) * numpy.exp(-abs(x - mu)/b))    
def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=True, align='left')
    loc, scale = 0., 1.
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, 0., 1.)
    plot(x, pdf)
    width = max(-min(num), max(num))
    xlim((-width, width))
    ylim((1.0, 10**7))
    show()
if __name__ == '__main__':
    main()

编辑

好的,这里尝试将其与常规拉普拉斯分布(而不是对数拉普拉斯分布)进行匹配。与上述尝试的区别:

  • 直方图是标准化的,
  • 直方图是线性的(不是对数)
  • laplace 函数的定义完全按照维基百科文章中的指定

输出: 在此处输入图像描述

如您所见,它不是最佳匹配,而是数字(直方图和拉普拉斯图) PDF)至少现在处于同一水平。我认为拉普拉斯对数会匹配得更好。我的方法(上面的来源)不起作用。有人能建议一种方法来做到这一点吗?

来源:

from pylab import *
import numpy   
def laplace(x, mu, b):
    return 1.0/(2*b) * numpy.exp(-abs(x - mu)/b)
def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=False, align='left', normed=True)
    loc, scale = 0., 0.54
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, loc, scale)
    plot(x, pdf)
    width = max(-min(num), max(num))
    xlim((-width, width))
        show()
if __name__ == '__main__':
    main()

(be sure to check out the EDIT at the end of the post before reading too deeply into the source)

I'm plotting a histogram of a population that seems to be of log Laplacian distribution:
enter image description here

I'm trying to draw a line of best fit for it to verify my hypothesis, but I'm having problems getting meaningful results.

I'm using the Laplacian PDF definition from Wikipedia and taking 10 to the power of the PDF (to "reverse" the effects of the log histogram).

What am I doing wrong?

Here is my code. I pipe things through standard input (cat pop.txt | python hist.py) -- here's a sample population.

from pylab import *
import numpy    
def laplace(x, mu, b):
    return 10**(1.0/(2*b) * numpy.exp(-abs(x - mu)/b))    
def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=True, align='left')
    loc, scale = 0., 1.
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, 0., 1.)
    plot(x, pdf)
    width = max(-min(num), max(num))
    xlim((-width, width))
    ylim((1.0, 10**7))
    show()
if __name__ == '__main__':
    main()

EDIT

OK, here is the attempt to match it to a regular Laplacian distribution (as opposed to a log Laplacian). Differences from above attempt:

  • histogram is normed
  • histogram is linear (not log)
  • laplace function defined exactly as specified in the Wikipedia article

Output:
enter image description here

As you can see, it isn't the best match, but the figures (the histogram and the Laplace PDF) are at least now in the same ballpark. I think the log Laplace will match a lot better. My approach (source above) didn't work. Can anybody suggest a way to do this what works?

Source:

from pylab import *
import numpy   
def laplace(x, mu, b):
    return 1.0/(2*b) * numpy.exp(-abs(x - mu)/b)
def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=False, align='left', normed=True)
    loc, scale = 0., 0.54
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, loc, scale)
    plot(x, pdf)
    width = max(-min(num), max(num))
    xlim((-width, width))
        show()
if __name__ == '__main__':
    main()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

多情癖 2024-11-15 18:49:13
  1. 您的 laplace() 函数似乎不是拉普拉斯分布。此外,numpy.log() 是自然对数(以 e 为底),而不是十进制。

  2. 您的直方图似乎没有标准化,而分布却是标准化的。

编辑:

  1. 不要使用一揽子导入from pyplot import *,它会咬你。

  2. 如果您要检查拉普拉斯分布(或其对数)的一致性,请利用后者围绕 mu 对称的事实:将 mu 修正为您的最大值直方图,并且您有一个单参数问题。而且您也只能使用直方图的任意一半。

  3. 使用numpy的直方图函数——这样您就可以获得直方图本身,然后您可以将其与拉普拉斯分布(和/或其对数)拟合。卡方将告诉您一致性的好(或多差)。为了拟合,您可以使用例如 scipy.optimize.leastsq 例程 (http://www.scipy.org/Cookbook/FittingData)

  1. Your laplace() function does not seem to be a Laplace distribution. Besides, numpy.log() is a natural logarithm (base e), not decimal.

  2. Your histogram does not seem to be normalized, while the distribution is.

EDIT:

  1. Don't use blanket imports from pyplot import *, it'll bite you.

  2. If you're checking consistency with Laplace distribution (or its log), use the fact that the latter is symmetric around mu: fix mu at a maximum of your histogram, and you have a single-parameter problem. And you can only use either half of the histogram as well.

  3. Use numpy's histogram function -- this way you can get the histogram itself, which you can then fit with a Laplace distribution (and/or its log). The chi-square will tell you how good (or bad) the consistency is. For fitting you can use, e.g. scipy.optimize.leastsq routine (http://www.scipy.org/Cookbook/FittingData)

少女七分熟 2024-11-15 18:49:13

我找到了解决我遇到的问题的方法。我没有使用 matplotlib.hist,而是结合使用 numpy.histogrammatplotlib.bar 来计算直方图并分两个单独的步骤绘制它。

我不确定是否有办法使用 matplotlib.hist 来做到这一点——不过,它肯定会更方便。
在此处输入图像描述

您可以看到这是一个更好的匹配。

我现在的问题是我需要估计 PDF 的 scale 参数。

来源:

from pylab import *
import numpy

def laplace(x, mu, b):
    """http://en.wikipedia.org/wiki/Laplace_distribution"""
    return 1.0/(2*b) * numpy.exp(-abs(x - mu)/b)

def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    count, bins = numpy.histogram(num, nbins)
    bins = bins[:-1]
    assert len(bins) == nbins
    #
    # FIRST we take the log of the histogram, THEN we normalize it.
    # Clean up after divide by zero
    #
    count = numpy.log(count)
    for i in range(nbins):
        if count[i] == -numpy.inf:
            count[i] = 0
    count = count/max(count)

    loc = 0.
    scale = 4.
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, loc, scale)
    pdf = pdf/max(pdf)

    width=1.0
    bar(bins-width/2, count, width=width)
    plot(x, pdf, color='r')
    xlim(min(num), max(num))
    show()

if __name__ == '__main__':
    main()

I've found a work-around for the problem I was having. Instead of using matplotlib.hist, I use numpy.histogram in combination with matplotlib.bar to calculate the histogram and draw it in two separate steps.

I'm not sure if there's a way to do this using matplotlib.hist -- it would definitely be more convenient, though.
enter image description here

You can see that it's a much better match.

My problem now is I need to estimate the scale parameter of the PDF.

Source:

from pylab import *
import numpy

def laplace(x, mu, b):
    """http://en.wikipedia.org/wiki/Laplace_distribution"""
    return 1.0/(2*b) * numpy.exp(-abs(x - mu)/b)

def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    count, bins = numpy.histogram(num, nbins)
    bins = bins[:-1]
    assert len(bins) == nbins
    #
    # FIRST we take the log of the histogram, THEN we normalize it.
    # Clean up after divide by zero
    #
    count = numpy.log(count)
    for i in range(nbins):
        if count[i] == -numpy.inf:
            count[i] = 0
    count = count/max(count)

    loc = 0.
    scale = 4.
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, loc, scale)
    pdf = pdf/max(pdf)

    width=1.0
    bar(bins-width/2, count, width=width)
    plot(x, pdf, color='r')
    xlim(min(num), max(num))
    show()

if __name__ == '__main__':
    main()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文