python 中的分段线性回归

发布于 2024-12-28 17:43:35 字数 132 浏览 1 评论 0原文

python中有一个库可以进行分段线性回归吗? 我想自动将多行拟合到我的数据中以获得如下结果: 分段回归

顺便说一句。我确实知道段数。

Is there a library in python to do segmented linear regression?
I'd like to fit multiple lines to my data automatically to get something like this:
segmented regression

Btw. I do know the number of segments.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

拿命拼未来 2025-01-04 17:43:35

输入图片这里的描述

正如上面评论中提到的,分段线性回归带来了许多自由参数的问题。因此,我决定放弃使用 n_segments * 3 - 1 参数(即 n_segments - 1 个段位置、n_segment y 偏移、n_segment 斜率)并执行数值优化的方法。相反,我寻找的区域已经具有大致恒定的斜率。

算法

  • 计算所有点的斜率
  • 对与线段具有相似斜率的点进行聚类(由决策树完成)
  • 对上一步中找到的线段执行线性回归

使用决策树而不是聚类算法来获取连接的线段而不是一组(非相邻)点。分割的细节可以通过决策树参数(当前为max_leaf_nodes)进行调整。

代码

import numpy as np
import matplotlib.pylab as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression

# parameters for setup
n_data = 20

# segmented linear regression parameters
n_seg = 3

np.random.seed(0)
fig, (ax0, ax1) = plt.subplots(1, 2)

# example 1
#xs = np.sort(np.random.rand(n_data))
#ys = np.random.rand(n_data) * .3 + np.tanh(5* (xs -.5))

# example 2
xs = np.linspace(-1, 1, 20)
ys = np.random.rand(n_data) * .3 + np.tanh(3*xs)

dys = np.gradient(ys, xs)

rgr = DecisionTreeRegressor(max_leaf_nodes=n_seg)
rgr.fit(xs.reshape(-1, 1), dys.reshape(-1, 1))
dys_dt = rgr.predict(xs.reshape(-1, 1)).flatten()

ys_sl = np.ones(len(xs)) * np.nan
for y in np.unique(dys_dt):
    msk = dys_dt == y
    lin_reg = LinearRegression()
    lin_reg.fit(xs[msk].reshape(-1, 1), ys[msk].reshape(-1, 1))
    ys_sl[msk] = lin_reg.predict(xs[msk].reshape(-1, 1)).flatten()
    ax0.plot([xs[msk][0], xs[msk][-1]],
             [ys_sl[msk][0], ys_sl[msk][-1]],
             color='r', zorder=1)

ax0.set_title('values')
ax0.scatter(xs, ys, label='data')
ax0.scatter(xs, ys_sl, s=3**2, label='seg lin reg', color='g', zorder=5)
ax0.legend()

ax1.set_title('slope')
ax1.scatter(xs, dys, label='data')
ax1.scatter(xs, dys_dt, label='DecisionTree', s=2**2)
ax1.legend()

plt.show()

enter image description here

As mentioned in a comment above, segmented linear regression brings the problem of many free parameters. I therefore decided to go away from an approach, which uses n_segments * 3 - 1 parameters (i.e. n_segments - 1 segment positions, n_segment y-offests, n_segment slopes) and performs numerical optimization. Instead, I look for regions, which have already a roughly constant slope.

Algorithm

  • Calculate the slope of all points
  • Cluster points with a similar slope to segments (done by the DecisionTree)
  • Perform LinearRegression on the segments, found in the previous step

A decision tree is used instead of a clustering algorithm to get connected segments and not set of (non neighboring) points. The details of the segmentation can be adjusted by the decision trees parameters (currently max_leaf_nodes).

Code

import numpy as np
import matplotlib.pylab as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression

# parameters for setup
n_data = 20

# segmented linear regression parameters
n_seg = 3

np.random.seed(0)
fig, (ax0, ax1) = plt.subplots(1, 2)

# example 1
#xs = np.sort(np.random.rand(n_data))
#ys = np.random.rand(n_data) * .3 + np.tanh(5* (xs -.5))

# example 2
xs = np.linspace(-1, 1, 20)
ys = np.random.rand(n_data) * .3 + np.tanh(3*xs)

dys = np.gradient(ys, xs)

rgr = DecisionTreeRegressor(max_leaf_nodes=n_seg)
rgr.fit(xs.reshape(-1, 1), dys.reshape(-1, 1))
dys_dt = rgr.predict(xs.reshape(-1, 1)).flatten()

ys_sl = np.ones(len(xs)) * np.nan
for y in np.unique(dys_dt):
    msk = dys_dt == y
    lin_reg = LinearRegression()
    lin_reg.fit(xs[msk].reshape(-1, 1), ys[msk].reshape(-1, 1))
    ys_sl[msk] = lin_reg.predict(xs[msk].reshape(-1, 1)).flatten()
    ax0.plot([xs[msk][0], xs[msk][-1]],
             [ys_sl[msk][0], ys_sl[msk][-1]],
             color='r', zorder=1)

ax0.set_title('values')
ax0.scatter(xs, ys, label='data')
ax0.scatter(xs, ys_sl, s=3**2, label='seg lin reg', color='g', zorder=5)
ax0.legend()

ax1.set_title('slope')
ax1.scatter(xs, dys, label='data')
ax1.scatter(xs, dys_dt, label='DecisionTree', s=2**2)
ax1.legend()

plt.show()
孤檠 2025-01-04 17:43:35

您只需按升序对 X 进行排序并创建多个线性回归。您可以使用 sklearn 中的 LinearRegression。

例如,将曲线一分为二将如下所示:

from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
X = np.array([-5,-4,-3,-2,-1,0,1,2,3,4,5])
Y = X**2
X=X.reshape(-1,1)
reg1 = LinearRegression().fit(X[0:6,:], Y[0:6])
reg2 = LinearRegression().fit(X[6:,:], Y[6:])

fig = plt.figure('Plot Data + Regression')
ax1 = fig.add_subplot(111)
ax1.plot(X, Y, marker='x', c='b', label='data')
ax1.plot(X[0:6,],reg1.predict(X[0:6,]), marker='o',c='g', label='linear r.')
ax1.plot(X[6:,],reg2.predict(X[6:,]), marker='o',c='g', label='linear r.')
ax1.set_title('Data vs Regression')
ax1.legend(loc=2)
plt.show()

在此处输入图像描述

我做了类似的实现,代码如下:
https://github.com/mavaladezt/Segmented-Algorithm

You just need to order X in ascending order and create several linear regressions. You can use LinearRegression from sklearn.

For example, dividing the curve in 2 will be something like this:

from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
X = np.array([-5,-4,-3,-2,-1,0,1,2,3,4,5])
Y = X**2
X=X.reshape(-1,1)
reg1 = LinearRegression().fit(X[0:6,:], Y[0:6])
reg2 = LinearRegression().fit(X[6:,:], Y[6:])

fig = plt.figure('Plot Data + Regression')
ax1 = fig.add_subplot(111)
ax1.plot(X, Y, marker='x', c='b', label='data')
ax1.plot(X[0:6,],reg1.predict(X[0:6,]), marker='o',c='g', label='linear r.')
ax1.plot(X[6:,],reg2.predict(X[6:,]), marker='o',c='g', label='linear r.')
ax1.set_title('Data vs Regression')
ax1.legend(loc=2)
plt.show()

enter image description here

I did a similar implementation, here is the code:
https://github.com/mavaladezt/Segmented-Algorithm

寄风 2025-01-04 17:43:35

python 库piecewise-regression 就是用来完成这个任务的。 Github 链接。

带有 1 个断点的简单示例
为了进行演示,首先生成一些示例数据:

import numpy as np

alpha_1 = -4
alpha_2 = -2
constant = 100
breakpoint_1 = 7
n_points = 200
np.random.seed(0)
xx = np.linspace(0, 20, n_points)
yy = constant + alpha_1*xx + (alpha_2-alpha_1) * np.maximum(xx - breakpoint_1, 0) + np.random.normal(size=n_points)

然后拟合分段模型:

import piecewise_regression
pw_fit = piecewise_regression.Fit(xx, yy, n_breakpoints=1)
pw_fit.summary()

并绘制它:

import matplotlib.pyplot as plt
pw_fit.plot()
plt.show()

在此处输入图像描述

示例 2 - 4 断点
现在让我们看一些与原始问题类似的数据,有 4 个断点。

import numpy as np

gradients = [0,2,1,2,-1,0]
constant = 0
breakpoints = [-4, -2, 1, 4] 
n_points = 200
np.random.seed(0)
xx = np.linspace(-10, 10, n_points)
yy = constant + gradients[0]*xx + np.random.normal(size=n_points)*0.5
for bp_n in range(len(breakpoints)):
    yy += (gradients[bp_n+1] - gradients[bp_n]) * np.maximum(xx - breakpoints[bp_n], 0)

拟合模型并绘制它:

import piecewise_regression
import matplotlib.pyplot as plt

pw_fit = piecewise_regression.Fit(xx, yy, n_breakpoints=4)

pw_fit.plot()

plt.xlabel("x")
plt.ylabel("y")
plt.ylim(-10, 20)
plt.show()

在此处输入图像描述

Google Colab 笔记本

There is the piecewise-regression python library for doing exactly this. Github link.

Simple Example with 1 Breakpoint.
For a demonstration, first generate some example data:

import numpy as np

alpha_1 = -4
alpha_2 = -2
constant = 100
breakpoint_1 = 7
n_points = 200
np.random.seed(0)
xx = np.linspace(0, 20, n_points)
yy = constant + alpha_1*xx + (alpha_2-alpha_1) * np.maximum(xx - breakpoint_1, 0) + np.random.normal(size=n_points)

Then fit a piecewise model:

import piecewise_regression
pw_fit = piecewise_regression.Fit(xx, yy, n_breakpoints=1)
pw_fit.summary()

And plot it:

import matplotlib.pyplot as plt
pw_fit.plot()
plt.show()

enter image description here

Example 2 - 4 Breakpoints.
Now let's look at some data that is similar to the original question, with 4 breakpoints.

import numpy as np

gradients = [0,2,1,2,-1,0]
constant = 0
breakpoints = [-4, -2, 1, 4] 
n_points = 200
np.random.seed(0)
xx = np.linspace(-10, 10, n_points)
yy = constant + gradients[0]*xx + np.random.normal(size=n_points)*0.5
for bp_n in range(len(breakpoints)):
    yy += (gradients[bp_n+1] - gradients[bp_n]) * np.maximum(xx - breakpoints[bp_n], 0)

Fit the model and plot it:

import piecewise_regression
import matplotlib.pyplot as plt

pw_fit = piecewise_regression.Fit(xx, yy, n_breakpoints=4)

pw_fit.plot()

plt.xlabel("x")
plt.ylabel("y")
plt.ylim(-10, 20)
plt.show()

enter image description here

Code examples in this Google Colab notebook

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文