scipy.stats 中 cdf 的精度

发布于 2024-11-14 11:48:42 字数 634 浏览 6 评论 0原文

我使用 chi2 分布作为模拟系统的理论问题。

对于给定的区间，我需要将此分布估计为 PMF，定义为该区间内 PDF 的积分。该值应接近间隔中心处的 PDF 值，但可能略有不同，具体取决于 PDF 的形状。

这就是我所做的：

import numpy
from scipy.stats import chi2

dist = chi2(10)
nbins = 120

F = dist.cdf(numpy.arange(nbins+1))
pmf = F[1:] - F[:-1] # surface inside the interval
pmf /= pmf.sum() # Normalisation

问题是 chi2.cdf(100, 10) 及以上给出的正是 1.0。所以我能得到的最小值约为 1.11e-16。但 chi2.pdf(100, 10) 并不完全是 0（大约是 2.5e-17）。

我的问题是：如何获得更高精度的 pmf 估计（可能高达 1e-25）？为什么 cdf 函数不如 pdf 函数精确？

原文

I'm using chi2 distribution as a theoretical problem for a simulation system.

For a given interval, I need to estimate this distribution as a PMF defined as the integral of the PDF inside that interval. This value should be near the value of the PDF at the center of the interval, but can be slightly different, depending on the shape of the PDF.

Here is what I do:

import numpy
from scipy.stats import chi2

dist = chi2(10)
nbins = 120

F = dist.cdf(numpy.arange(nbins+1))
pmf = F[1:] - F[:-1] # surface inside the interval
pmf /= pmf.sum() # Normalisation

The problem is that chi2.cdf(100, 10) and above gives exactly 1.0. So the minimum value I'm able to get is around 1.11e-16. But chi2.pdf(100, 10) isn't exactly 0 (it's about 2.5e-17).

My question is: how can I get my pmf estimation with greater precision (maybe up to 1e-25)? Why is cdf function less precise than pdf function?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

世界和平 2024-11-21 11:48:42

cdf 在等于 1 的浮点精度范围内，但 sf 接近于零，因此微小的差异（1e-20）不会被大 1 掩盖。（参见 JABS 参考）

>>> probs_from_cdf = np.diff(stats.chi2.cdf(np.arange(nbins+1), 10))
>>> probs_from_sf = np.diff(stats.chi2.sf(np.arange(nbins+1)[::-1], 10))[::-1]
>>> probs_from_sf[:4]
array([ 0.00017212,  0.00348773,  0.01491609,  0.03407708])
>>> probs_from_cdf[:4]
array([ 0.00017212,  0.00348773,  0.01491609,  0.03407708])
>>> probs_from_cdf[-5:]
array([ 0.,  0.,  0.,  0.,  0.])
>>> probs_from_sf[-5:]
array([  1.94252577e-20,   1.21955220e-20,   7.65430774e-21,
         4.80270079e-21,   3.01259913e-21])

我不知道的精确范围是多远sf，即 scipy.special.chdtrc(df, x)，去

cdf is within floating point precision equal to one, but sf is close to zero, so tiny differences, 1e-20, are not covered up by the big 1. (see JABS reference)

>>> probs_from_cdf = np.diff(stats.chi2.cdf(np.arange(nbins+1), 10))
>>> probs_from_sf = np.diff(stats.chi2.sf(np.arange(nbins+1)[::-1], 10))[::-1]
>>> probs_from_sf[:4]
array([ 0.00017212,  0.00348773,  0.01491609,  0.03407708])
>>> probs_from_cdf[:4]
array([ 0.00017212,  0.00348773,  0.01491609,  0.03407708])
>>> probs_from_cdf[-5:]
array([ 0.,  0.,  0.,  0.,  0.])
>>> probs_from_sf[-5:]
array([  1.94252577e-20,   1.21955220e-20,   7.65430774e-21,
         4.80270079e-21,   3.01259913e-21])

I don't know how far the accurate range of the sf, i.e. scipy.special.chdtrc(df, x), goes

回复收藏 0 原文

心的憧憬 2024-11-21 11:48:42

通常，每当我遇到精度问题时，我第一个使用的工具就是 mpmath。 90% 的时间它都能正常工作，而且速度足够快。在这种情况下，我们可以写：

import mpmath
mpmath.mp.dps = 50 # decimal digits of precision

def pdf(x,k):
    x,k = mpmath.mpf(x), mpmath.mpf(k)
    if x < 0: return 0
    return 1/(2**(k/2) * mpmath.gamma(k/2)) * (x**(k/2-1)) * mpmath.exp(-x/2)

def cdf(x,k): 
    x,k = mpmath.mpf(x), mpmath.mpf(k) 
    return mpmath.gammainc(k/2, 0, x/2, regularized=True)

def cdf_via_quad(s,k):
    return mpmath.quad(lambda x: pdf(x,k), [0, s])

给予（使用你的 F）：

>>> pdf(2,10)
mpf('0.0076641550244050483665734118783637680717877318964951605')
>>> cdf(2,10)
mpf('0.003659846827343712345456455812710150667594853455628779')
>>> cdf_via_quad(2,10)
mpf('0.003659846827343712345456455812710150667594853455628779')
>>> F[2]
0.0036598468273437131
>>> pdf(100,10)
mpf('2.5113930312030179466371651256862142900427508479560716e-17')
>>> cdf(100,10)
mpf('0.99999999999999994550298017079470664906667698474760744')
>>> cdf_via_quad(100,10)
mpf('0.99999999999999994550298017079470664906667698474760744')
>>> F[100]
1.0

应该很容易使用四元组来获得你需要的任何标准化。

Usually whenever I have a precision problem the first tool I reach for is mpmath. 90% of the time it Just Works(tm), quickly enough. In this case, we can write:

import mpmath
mpmath.mp.dps = 50 # decimal digits of precision

def pdf(x,k):
    x,k = mpmath.mpf(x), mpmath.mpf(k)
    if x < 0: return 0
    return 1/(2**(k/2) * mpmath.gamma(k/2)) * (x**(k/2-1)) * mpmath.exp(-x/2)

def cdf(x,k): 
    x,k = mpmath.mpf(x), mpmath.mpf(k) 
    return mpmath.gammainc(k/2, 0, x/2, regularized=True)

def cdf_via_quad(s,k):
    return mpmath.quad(lambda x: pdf(x,k), [0, s])

giving (using your F):

>>> pdf(2,10)
mpf('0.0076641550244050483665734118783637680717877318964951605')
>>> cdf(2,10)
mpf('0.003659846827343712345456455812710150667594853455628779')
>>> cdf_via_quad(2,10)
mpf('0.003659846827343712345456455812710150667594853455628779')
>>> F[2]
0.0036598468273437131
>>> pdf(100,10)
mpf('2.5113930312030179466371651256862142900427508479560716e-17')
>>> cdf(100,10)
mpf('0.99999999999999994550298017079470664906667698474760744')
>>> cdf_via_quad(100,10)
mpf('0.99999999999999994550298017079470664906667698474760744')
>>> F[100]
1.0

Should be straightforward to use quad to get any normalization you need.

回复收藏 0 原文

~没有更多了~