Python - 计算有错误的趋势线

发布于 2024-12-01 15:02:41 字数 500 浏览 2 评论 0 原文

因此,我将一些数据存储为两个列表,并使用

plot(datasetx, datasety)

然后我设置了趋势线

trend = polyfit(datasetx, datasety)
trendx = []
trendy = []

for a in range(datasetx[0], (datasetx[-1]+1)):
    trendx.append(a)
    trendy.append(trend[0]*a**2 + trend[1]*a + trend[2])

plot(trendx, trendy)

来绘制它们,但是我有第三个数据列表,这是原始数据集中的错误。我可以很好地绘制误差线,但我不知道如何使用它,如何找到多项式趋势线系数中的误差。

假设我的趋势线是 5x^2 + 3x + 4 = y,那么 5、3 和 4 值上肯定存在某种错误。

有没有使用 NumPy 的工具可以为我计算这个?

So I've got some data stored as two lists, and plotted them using

plot(datasetx, datasety)

Then I set a trendline

trend = polyfit(datasetx, datasety)
trendx = []
trendy = []

for a in range(datasetx[0], (datasetx[-1]+1)):
    trendx.append(a)
    trendy.append(trend[0]*a**2 + trend[1]*a + trend[2])

plot(trendx, trendy)

But I have a third list of data, which is the error in the original datasety. I'm fine with plotting the errorbars, but what I don't know is using this, how to find the error in the coefficients of the polynomial trendline.

So say my trendline came out to be 5x^2 + 3x + 4 = y, there needs to be some sort of error on the 5, 3 and 4 values.

Is there a tool using NumPy that will calculate this for me?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

柏拉图鍀咏恒 2024-12-08 15:02:41

我认为你可以使用 scipy.optimize 的函数 curve_fit (文档)。一个基本的用法示例:

import numpy as np
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a*x**2 + b*x + c

x = np.linspace(0,4,50)
y = func(x, 5, 3, 4)
yn = y + 0.2*np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn)

根据文档,pcov 给出:

popt 的估计协方差。对角线提供方差
参数估计。

因此,通过这种方式,您可以计算系数的误差估计。要获得标准差,您可以取方差的平方根。

现在系数有误差,但它仅基于 ydata 和拟合之间的偏差。如果您还想考虑 ydata 本身的错误,curve_fit 函数提供 sigma 参数:

sigma:无或N长度序列

如果不是None,则表示ydata的标准差。这
向量(如果给定)将用作最小二乘中的权重
问题。

一个完整的示例:

import numpy as np
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a*x**2 + b*x + c

x = np.linspace(0,4,20)
y = func(x, 5, 3, 4)
# generate noisy ydata
yn = y + 0.2 * y * np.random.normal(size=len(x))
# generate error on ydata
y_sigma = 0.2 * y * np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn, sigma = y_sigma)

# plot
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)
ax.errorbar(x, yn, yerr = y_sigma, fmt = 'o')
ax.plot(x, np.polyval(popt, x), '-')
ax.text(0.5, 100, r"a = {0:.3f} +/- {1:.3f}".format(popt[0], pcov[0,0]**0.5))
ax.text(0.5, 90, r"b = {0:.3f} +/- {1:.3f}".format(popt[1], pcov[1,1]**0.5))
ax.text(0.5, 80, r"c = {0:.3f} +/- {1:.3f}".format(popt[2], pcov[2,2]**0.5))
ax.grid()
plt.show()

result


然后是其他内容,关于使用 numpy 数组。使用 numpy 的主要优点之一是可以避免 for 循环,因为数组上的操作按元素应用。因此,示例中的 for 循环也可以按以下方式完成:

trendx = arange(datasetx[0], (datasetx[-1]+1))
trendy = trend[0]*trendx**2 + trend[1]*trendx + trend[2]

我使用 arange 而不是 range,因为它返回 numpy 数组而不是列表。
在这种情况下,您还可以使用 numpy 函数 polyval

trendy = polyval(trend, trendx)

I think you can use the function curve_fit of scipy.optimize (documentation). A basic example of the usage:

import numpy as np
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a*x**2 + b*x + c

x = np.linspace(0,4,50)
y = func(x, 5, 3, 4)
yn = y + 0.2*np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn)

Following the documentation, pcov gives:

The estimated covariance of popt. The diagonals provide the variance
of the parameter estimate.

So in this way you can calculate an error estimate on the coefficients. To have the standard deviation you can take the square root of the variance.

Now you have an error on the coefficients, but it is only based on the deviation between the ydata and the fit. In case you also want to account for an error on the ydata itself, the curve_fit function provides the sigma argument:

sigma : None or N-length sequence

If not None, it represents the standard-deviation of ydata. This
vector, if given, will be used as weights in the least-squares
problem.

A complete example:

import numpy as np
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a*x**2 + b*x + c

x = np.linspace(0,4,20)
y = func(x, 5, 3, 4)
# generate noisy ydata
yn = y + 0.2 * y * np.random.normal(size=len(x))
# generate error on ydata
y_sigma = 0.2 * y * np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn, sigma = y_sigma)

# plot
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)
ax.errorbar(x, yn, yerr = y_sigma, fmt = 'o')
ax.plot(x, np.polyval(popt, x), '-')
ax.text(0.5, 100, r"a = {0:.3f} +/- {1:.3f}".format(popt[0], pcov[0,0]**0.5))
ax.text(0.5, 90, r"b = {0:.3f} +/- {1:.3f}".format(popt[1], pcov[1,1]**0.5))
ax.text(0.5, 80, r"c = {0:.3f} +/- {1:.3f}".format(popt[2], pcov[2,2]**0.5))
ax.grid()
plt.show()

result


Then something else, about using numpy arrays. One of the main advantages of using numpy is that you can avoid for loops because operations on arrays apply elementwise. So the for-loop in your example can also be done as following:

trendx = arange(datasetx[0], (datasetx[-1]+1))
trendy = trend[0]*trendx**2 + trend[1]*trendx + trend[2]

Where I use arange instead of range as it returns a numpy array instead of a list.
In this case you can also use the numpy function polyval:

trendy = polyval(trend, trendx)
老旧海报 2024-12-08 15:02:41

我还没有找到任何方法来获取 numpy 或 python 内置系数中的错误。我有一个简单的工具,是根据 John Taylor 的错误分析简介的第 8.5 节和 8.6 节编写的。也许这足以满足您的任务(请注意,默认返回是方差,而不是标准差)。由于显着的协方差,您可能会得到很大的错误(如提供的示例中所示)。

def leastSquares(xMat, yMat):
'''
Purpose
-------
Perform least squares using the procedure outlined in 8.5 and 8.6 of Taylor, solving
matrix equation X a = Y

Examples
--------
>>> from scipy import matrix
>>> xMat = matrix([[  1,   5,  25],
                   [  1,   7,  49],
                   [  1,   9,  81],
                   [  1,  11, 121]])
>>> # matrix has rows of format [constant, x, x^2]
>>> yMat = matrix([[142],
                   [168],
                   [211],
                   [251]])
>>> a, varCoef, yRes = leastSquares(xMat, yMat)
>>> # a is a column matrix, holding the three coefficients a, b, c, corresponding to
>>> # the equation a + b*x + c*x^2

Returns
-------
a: matrix
    best fit coefficients
varCoef: matrix
    variance of derived coefficents
yRes: matrix
    y-residuals of fit 
'''
xMatSize = xMat.shape
numMeas = xMatSize[0]
numVars = xMatSize[1]

xxMat = xMat.T * xMat
xyMat = xMat.T * yMat
xxMatI = xxMat.I

aMat = xxMatI * xyMat
yAvgMat = xMat * aMat
yRes = yMat - yAvgMat

var = (yRes.T * yRes) / (numMeas - numVars)
varCoef = xxMatI.diagonal() * var[0, 0]

return aMat, varCoef, yRes

I have not been able to find any way of getting the errors in the coefficients that is built in to numpy or python. I have a simple tool that I wrote based on Section 8.5 and 8.6 of John Taylor's An Introduction to Error Analysis. Maybe this will be sufficient for your task (note the default return is the variance, not the standard deviation). You can get large errors (as in the provided example) because of significant covariance.

def leastSquares(xMat, yMat):
'''
Purpose
-------
Perform least squares using the procedure outlined in 8.5 and 8.6 of Taylor, solving
matrix equation X a = Y

Examples
--------
>>> from scipy import matrix
>>> xMat = matrix([[  1,   5,  25],
                   [  1,   7,  49],
                   [  1,   9,  81],
                   [  1,  11, 121]])
>>> # matrix has rows of format [constant, x, x^2]
>>> yMat = matrix([[142],
                   [168],
                   [211],
                   [251]])
>>> a, varCoef, yRes = leastSquares(xMat, yMat)
>>> # a is a column matrix, holding the three coefficients a, b, c, corresponding to
>>> # the equation a + b*x + c*x^2

Returns
-------
a: matrix
    best fit coefficients
varCoef: matrix
    variance of derived coefficents
yRes: matrix
    y-residuals of fit 
'''
xMatSize = xMat.shape
numMeas = xMatSize[0]
numVars = xMatSize[1]

xxMat = xMat.T * xMat
xyMat = xMat.T * yMat
xxMatI = xxMat.I

aMat = xxMatI * xyMat
yAvgMat = xMat * aMat
yRes = yMat - yAvgMat

var = (yRes.T * yRes) / (numMeas - numVars)
varCoef = xxMatI.diagonal() * var[0, 0]

return aMat, varCoef, yRes
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文