scipy python中最小平方拟合的置信区间

发布于 2024-11-04 05:12:26 字数 55 浏览 8 评论 0 原文

如何在Python中计算最小二乘拟合(scipy.optimize.leastsq)的置信区间?

How to calculate confidence interval for the least square fit (scipy.optimize.leastsq) in python?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

雪若未夕 2024-11-11 05:12:26

我会使用引导方法。
请参阅此处:http://phe.rockefeller.edu/LogletLab/whitepaper/node17.html< /a>

噪声高斯的简单示例:

x = arange(-10, 10, 0.01)

# model function
def f(p):
    mu, s = p
    return exp(-(x-mu)**2/(2*s**2))

# create error function for dataset    
def fff(d):
    def ff(p):
        return d-f(p)
    return ff

# create noisy dataset from model
def noisy_data(p):
    return f(p)+normal(0,0.1,len(x))

# fit dataset to model with least squares    
def fit(d):
    ff = fff(d)
    p = leastsq(ff,[0,1])[0]
    return p

# bootstrap estimation        
def bootstrap(d):
    p0 = fit(d)
    residuals = f(p0)-d
    s_residuals = std(residuals)

    ps = []
    for i in range(1000):
        new_d = d+normal(0,s_residuals,len(d))
        ps.append(fit(new_d))

    ps = array(ps)
    mean_params = mean(ps,0)
    std_params = std(ps,0)

    return mean_params, std_params

data = noisy_data([0.5, 2.1])
mean_params, std_params = bootstrap(data)

print "95% confidence interval:"
print "mu: ", mean_params[0], " +/- ", std_params[0]*1.95996
print "sigma: ", mean_params[1], " +/- ", std_params[1]*1.95996

I would use bootstrapping method.

See here: http://phe.rockefeller.edu/LogletLab/whitepaper/node17.html

Simple example for noisy gaussian:

x = arange(-10, 10, 0.01)

# model function
def f(p):
    mu, s = p
    return exp(-(x-mu)**2/(2*s**2))

# create error function for dataset    
def fff(d):
    def ff(p):
        return d-f(p)
    return ff

# create noisy dataset from model
def noisy_data(p):
    return f(p)+normal(0,0.1,len(x))

# fit dataset to model with least squares    
def fit(d):
    ff = fff(d)
    p = leastsq(ff,[0,1])[0]
    return p

# bootstrap estimation        
def bootstrap(d):
    p0 = fit(d)
    residuals = f(p0)-d
    s_residuals = std(residuals)

    ps = []
    for i in range(1000):
        new_d = d+normal(0,s_residuals,len(d))
        ps.append(fit(new_d))

    ps = array(ps)
    mean_params = mean(ps,0)
    std_params = std(ps,0)

    return mean_params, std_params

data = noisy_data([0.5, 2.1])
mean_params, std_params = bootstrap(data)

print "95% confidence interval:"
print "mu: ", mean_params[0], " +/- ", std_params[0]*1.95996
print "sigma: ", mean_params[1], " +/- ", std_params[1]*1.95996
心作怪 2024-11-11 05:12:26

我不确定你所说的置信区间是什么意思。

一般来说,leastsq 对您尝试最小化的函数了解不多,因此它无法真正给出置信区间。然而,它确实返回了 Hessian 矩阵的估计,换句话说,就是二阶导数对多维问题的推广。

正如函数的文档字符串中所暗示的,您可以使用该信息以及残差(拟合解与实际数据之间的差异)来计算参数估计的协方差,这是置信区间的局部猜测。

请注意,这只是局部信息,我怀疑严格来说,只有当您的目标函数是严格凸的时,您才能得出结论。我没有关于该声明的任何证据或参考:)。

I am not sure what you mean by confidence interval.

In general, leastsq doesn't know much about the function that you are trying to minimize, so it can't really give a confidence interval. However, it does return an estimate of the Hessian, in other word the generalization of 2nd derivatives to multidimensional problems.

As hinted in the docstring of the function, you could use that information along with the residuals (the difference between your fitted solution and the actual data) to computed the covariance of parameter estimates, which is a local guess of the confidence interval.

Note that it is only a local information, and I suspect that you can strictly speaking come to a conclusion only if your objective function is strictly convex. I don't have any proofs or references on that statement :).

烟凡古楼 2024-11-11 05:12:26

估计置信区间 (CI) 的最简单方法是将标准误差(标准偏差)乘以常数。要计算常数,您需要知道自由度 (DOF) 数以及要计算 CI 的置信水平。
以这种方式估计的 CI 有时称为渐近 CI。
您可以在 Motulsky & 的“使用线性和非线性回归将模型拟合到生物数据”中阅读更多相关信息。克里斯托普洛斯 (Google 图书)。同一本书(或非常相似)可以免费作为作者软件的手册

您还可以阅读 如何使用 C++ Boost.Math 库计算 CI。在此示例中,CI 是针对一个变量的分布计算的。在最小二乘拟合的情况下,DOF 不是 N-1,而是 NM,其中 M 是参数数量。在 Python 中做同样的事情应该很容易。

这是最简单的估计。我不知道zephyr提出的bootstrapping方法,但它可能比我写的方法更可靠。

The simplest way of estimating confidence interval (CI) is to multiply standard errors (standard deviation) by a constant. To calculate the constant you need to know the number of degrees of freedom (DOF) and the confidence level for which you want to calculate the CI.
The CI estimated in this way is sometimes called asymptotic CI.
You can read more about it in "Fitting models to biological data using linear and nonlinear regression" by Motulsky & Christopoulos (google books). The same book (or very similar) is available for free as a manual for author's software.

You may also read how to calculate CI using the C++ Boost.Math library. In this example CI is calculated for a distribution of one variable. In the case of least squares fitting the DOF is not N-1, but N-M, where M is the number of parameters. It should be easy to do the same in Python.

This is the simplest estimation. I don't know the bootstrapping method proposed by zephyr, but it may be more reliable than the method I wrote about.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文