如何在Python中进行指数和对数曲线拟合?我发现只有多项式拟合
我有一组数据,我想比较哪一行最能描述它(不同阶的多项式,指数或对数)。
我使用 Python 和 Numpy,对于多项式拟合,有一个函数 polyfit()
。但我没有发现这样的指数和对数拟合函数。
有吗?或者另外如何解决?
I have a set of data and I want to compare which line describes it best (polynomials of different orders, exponential or logarithmic).
I use Python and Numpy and for polynomial fitting there is a function polyfit()
. But I found no such functions for exponential and logarithmic fitting.
Are there any? Or how to solve it otherwise?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
发布评论
评论(7)
您还可以使用 scipy.optimize 中的 curve_fit 将一组数据拟合到您喜欢的任何函数。例如,如果您想拟合指数函数(来自文档 ):
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
x = np.linspace(0,4,50)
y = func(x, 2.5, 1.3, 0.5)
yn = y + 0.2*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn)
然后如果你想绘图,你可以这样做:
plt.figure()
plt.plot(x, yn, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
(注意:绘图时 popt
前面的 *
会将术语展开到
、func
所期望的 ab
和 c
。)
我在这方面遇到了一些麻烦,所以让我非常明确,以便像我这样的菜鸟能够理解。
假设我们有一个数据文件或类似的文件,
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
import sympy as sym
"""
Generate some data, let's imagine that you already have this.
"""
x = np.linspace(0, 3, 50)
y = np.exp(x)
"""
Plot your data
"""
plt.plot(x, y, 'ro',label="Original Data")
"""
brutal force to avoid errors
"""
x = np.array(x, dtype=float) #transform your data in a numpy array of floats
y = np.array(y, dtype=float) #so the curve_fit can work
"""
create a function to fit with your data. a, b, c and d are the coefficients
that curve_fit will calculate for you.
In this part you need to guess and/or use mathematical knowledge to find
a function that resembles your data
"""
def func(x, a, b, c, d):
return a*x**3 + b*x**2 +c*x + d
"""
make the curve_fit
"""
popt, pcov = curve_fit(func, x, y)
"""
The result is:
popt[0] = a , popt[1] = b, popt[2] = c and popt[3] = d of the function,
so f(x) = popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3].
"""
print "a = %s , b = %s, c = %s, d = %s" % (popt[0], popt[1], popt[2], popt[3])
"""
Use sympy to generate the latex sintax of the function
"""
xs = sym.Symbol('\lambda')
tex = sym.latex(func(xs,*popt)).replace('
结果是:
a = 0.849195983017 , b = -1.18101681765, c = 2.24061176543, d = 0.816643894816
, '')
plt.title(r'$f(\lambda)= %s
结果是:
a = 0.849195983017 , b = -1.18101681765, c = 2.24061176543, d = 0.816643894816
%(tex),fontsize=16)
"""
Print the coefficients and plot the funcion.
"""
plt.plot(x, func(x, *popt), label="Fitted Curve") #same as line above \/
#plt.plot(x, popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3], label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
结果是:
a = 0.849195983017 , b = -1.18101681765, c = 2.24061176543, d = 0.816643894816
这是使用来自 scikit 学习。
给定
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import FunctionTransformer
np.random.seed(123)
# General Functions
def func_exp(x, a, b, c):
"""Return values from a general exponential function."""
return a * np.exp(b * x) + c
def func_log(x, a, b, c):
"""Return values from a general log function."""
return a * np.log(b * x) + c
# Helper
def generate_data(func, *args, jitter=0):
"""Return a tuple of arrays with random data along a general function."""
xs = np.linspace(1, 5, 50)
ys = func(xs, *args)
noise = jitter * np.random.normal(size=len(xs)) + jitter
xs = xs.reshape(-1, 1) # xs[:, np.newaxis]
ys = (ys + noise).reshape(-1, 1)
return xs, ys
transformer = FunctionTransformer(np.log, validate=True)
代码
拟合指数数据
# Data
x_samp, y_samp = generate_data(func_exp, 2.5, 1.2, 0.7, jitter=3)
y_trans = transformer.fit_transform(y_samp) # 1
# Regression
regressor = LinearRegression()
results = regressor.fit(x_samp, y_trans) # 2
model = results.predict
y_fit = model(x_samp)
# Visualization
plt.scatter(x_samp, y_samp)
plt.plot(x_samp, np.exp(y_fit), "k--", label="Fit") # 3
plt.title("Exponential Fit")
拟合日志数据
# Data
x_samp, y_samp = generate_data(func_log, 2.5, 1.2, 0.7, jitter=0.15)
x_trans = transformer.fit_transform(x_samp) # 1
# Regression
regressor = LinearRegression()
results = regressor.fit(x_trans, y_samp) # 2
model = results.predict
y_fit = model(x_trans)
# Visualization
plt.scatter(x_samp, y_samp)
plt.plot(x_samp, y_fit, "k--", label="Fit") # 3
plt.title("Logarithmic Fit")
详细信息
常规步骤
- 将日志操作应用于数据值(
x
、y
或两者) - 将数据回归到线性化模型
- 通过“反转”任何日志操作(使用
np.exp())并拟合原始数据
假设我们的数据遵循指数趋势,一般方程+可能是:
我们可以线性化后一个方程(例如 y = 截距 + 斜率* x) 通过获取日志:
给定线性方程++和回归参数,我们可以
- 通过截距计算:
A
(ln(A)
) B
通过斜率 (B
)
线性化技术总结
Relationship | Example | General Eqn. | Altered Var. | Linearized Eqn.
-------------|------------|----------------------|----------------|------------------------------------------
Linear | x | y = B * x + C | - | y = C + B * x
Logarithmic | log(x) | y = A * log(B*x) + C | log(x) | y = C + A * (log(B) + log(x))
Exponential | 2**x, e**x | y = A * exp(B*x) + C | log(y) | log(y-C) = log(A) + B * x
Power | x**2 | y = B * x**N + C | log(x), log(y) | log(y-C) = log(B) + N * log(x)
+注意:当噪声为小且C=0。谨慎使用。
++注意:虽然更改 x 数据有助于线性化指数数据,但更改 y 数据有助于线性化对数数据> 数据。
好吧,我想你总是可以使用:
np.log --> natural log
np.log10 --> base 10
np.log2 --> base 2
稍微修改 IanVS 的答案:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
#return a * np.exp(-b * x) + c
return a * np.log(b * x) + c
x = np.linspace(1,5,50) # changed boundary conditions to avoid division by 0
y = func(x, 2.5, 1.3, 0.5)
yn = y + 0.2*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn)
plt.figure()
plt.plot(x, yn, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
这会产生下图:
我们在解决这两个问题的同时演示了 lmfit
的功能。
给定
import lmfit
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(123)
# General Functions
def func_log(x, a, b, c):
"""Return values from a general log function."""
return a * np.log(b * x) + c
# Data
x_samp = np.linspace(1, 5, 50)
_noise = np.random.normal(size=len(x_samp), scale=0.06)
y_samp = 2.5 * np.exp(1.2 * x_samp) + 0.7 + _noise
y_samp2 = 2.5 * np.log(1.2 * x_samp) + 0.7 + _noise
代码
方法 1 - lmfit
模型
拟合指数数据
regressor = lmfit.models.ExponentialModel() # 1
initial_guess = dict(amplitude=1, decay=-1) # 2
results = regressor.fit(y_samp, x=x_samp, **initial_guess)
y_fit = results.best_fit
plt.plot(x_samp, y_samp, "o", label="Data")
plt.plot(x_samp, y_fit, "k--", label="Fit")
plt.legend()
方法2 - 自定义模型
拟合日志数据
regressor = lmfit.Model(func_log) # 1
initial_guess = dict(a=1, b=.1, c=.1) # 2
results = regressor.fit(y_samp2, x=x_samp, **initial_guess)
y_fit = results.best_fit
plt.plot(x_samp, y_samp2, "o", label="Data")
plt.plot(x_samp, y_fit, "k--", label="Fit")
plt.legend()
详细信息
- 选择回归类
- 提供尊重函数域的命名初始猜测
您可以从回归器确定推断参数目的。示例:
regressor.param_names
# ['decay', 'amplitude']
要进行预测,请使用 ModelResult.eval()方法。
model = results.eval
y_pred = model(x=np.array([1.5]))
注意:ExponentialModel()
遵循衰减函数,它接受两个参数,其中一个为负数。
另请参阅 ExponentialGaussianModel()
,它接受 更多参数。
通过>安装
库pip 安装 lmfit
。
Wolfram 有一个用于拟合指数的封闭式解决方案。他们也有类似的解决方案来拟合 对数 和 幂律。
我发现这比 scipy 的 curve_fit 效果更好。特别是当您没有“接近零”的数据时。这是一个示例:
import numpy as np
import matplotlib.pyplot as plt
# Fit the function y = A * exp(B * x) to the data
# returns (A, B)
# From: https://mathworld.wolfram.com/LeastSquaresFittingExponential.html
def fit_exp(xs, ys):
S_x2_y = 0.0
S_y_lny = 0.0
S_x_y = 0.0
S_x_y_lny = 0.0
S_y = 0.0
for (x,y) in zip(xs, ys):
S_x2_y += x * x * y
S_y_lny += y * np.log(y)
S_x_y += x * y
S_x_y_lny += x * y * np.log(y)
S_y += y
#end
a = (S_x2_y * S_y_lny - S_x_y * S_x_y_lny) / (S_y * S_x2_y - S_x_y * S_x_y)
b = (S_y * S_x_y_lny - S_x_y * S_y_lny) / (S_y * S_x2_y - S_x_y * S_x_y)
return (np.exp(a), b)
xs = [33, 34, 35, 36, 37, 38, 39, 40, 41, 42]
ys = [3187, 3545, 4045, 4447, 4872, 5660, 5983, 6254, 6681, 7206]
(A, B) = fit_exp(xs, ys)
plt.figure()
plt.plot(xs, ys, 'o-', label='Raw Data')
plt.plot(xs, [A * np.exp(B *x) for x in xs], 'o-', label='Fit')
plt.title('Exponential Fit Test')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend(loc='best')
plt.tight_layout()
plt.show()
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
要拟合 y = A + B log x,只需将 y 拟合到 (记录x)。
为了拟合y = AeBx,两边取对数得到logy = log A + Bx。因此适合(log y)与x。
请注意,将 (log y) 拟合为线性将强调 y 的小值,从而导致大 y 出现较大偏差。这是因为
polyfit
(线性回归)的工作原理是最小化 Σi (ΔY)2< /sup> = Σi (Yi − Ŷ< em>i)2。当Yi = log yi时,残基 ΔYi = Δ(log yi< /sub>) ≈ Δyi / |yi|。因此,即使polyfit
对于较大的 y 做出了非常糟糕的决定,“除以-|y|”因子将对其进行补偿,导致polyfit
倾向于较小的值。这可以通过为每个条目赋予与y成比例的“权重”来缓解。
polyfit
通过w
关键字参数支持加权最小二乘。请注意,Excel、LibreOffice 和大多数科学计算器通常使用未加权(有偏差)公式来计算指数回归/趋势线。如果您希望结果与这些平台兼容,甚至不要包含权重如果它能提供更好的结果。
现在,如果您可以使用 scipy,则可以使用
scipy.optimize.curve_fit
无需转换即可拟合任何模型。对于 y = A + B log x 结果与转换方法相同:
对于 y = AeBx,但是,我们可以获得更好的拟合,因为它计算 Δ(log y) 直接。但我们需要提供一个初始化猜测,以便 curve_fit 能够达到所需的局部最小值。
For fitting y = A + B log x, just fit y against (log x).
For fitting y = AeBx, take the logarithm of both side gives log y = log A + Bx. So fit (log y) against x.
Note that fitting (log y) as if it is linear will emphasize small values of y, causing large deviation for large y. This is because
polyfit
(linear regression) works by minimizing ∑i (ΔY)2 = ∑i (Yi − Ŷi)2. When Yi = log yi, the residues ΔYi = Δ(log yi) ≈ Δyi / |yi|. So even ifpolyfit
makes a very bad decision for large y, the "divide-by-|y|" factor will compensate for it, causingpolyfit
favors small values.This could be alleviated by giving each entry a "weight" proportional to y.
polyfit
supports weighted-least-squares via thew
keyword argument.Note that Excel, LibreOffice and most scientific calculators typically use the unweighted (biased) formula for the exponential regression / trend lines. If you want your results to be compatible with these platforms, do not include the weights even if it provides better results.
Now, if you can use scipy, you could use
scipy.optimize.curve_fit
to fit any model without transformations.For y = A + B log x the result is the same as the transformation method:
For y = AeBx, however, we can get a better fit since it computes Δ(log y) directly. But we need to provide an initialize guess so
curve_fit
can reach the desired local minimum.