快速& C 中的有效最小二乘拟合算法?
我正在尝试对两个数据数组实现线性最小二乘拟合:时间与幅度。到目前为止,我知道的唯一技术是测试 (y = m*x+b) 中所有可能的 m 和 b 点,然后找出最适合我的数据的组合,以便其误差最小。然而,我认为迭代这么多组合有时是没有用的,因为它测试了一切。有什么我不知道的技术可以加快这个过程吗?谢谢。
I am trying to implement a linear least squares fit onto 2 arrays of data: time vs amplitude. The only technique I know so far is to test all of the possible m and b points in (y = m*x+b) and then find out which combination fits my data best so that it has the least error. However, I think iterating so many combinations is sometimes useless because it tests out everything. Are there any techniques to speed up the process that I don't know about? Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
试试这个代码。它适合您的 (x,y) 数据。
linreg
的参数为成功时返回值为 0,失败时返回值为 !=0。
以下是代码
示例
您可以在线运行此示例。
这是输出
这是 Excel 绘图和线性拟合(用于验证)。
所有值与上面的 C 代码完全一致(注意 C 代码返回
r
,而 Excel 返回R**2
)。Try this code. It fits
y = mx + b
to your (x,y) data.The arguments to
linreg
areThe return value is 0 on success, !=0 on failure.
Here's the code
Example
You can run this example online.
Here is the output
Here is the Excel plot and linear fit (for verification).
All values agree exactly with the C code above (note C code returns
r
while Excel returnsR**2
).有有效的最小二乘拟合算法;有关详细信息,请参阅维基百科。还有一些库可以为您实现算法,可能比简单的实现更有效; GNU Scientific Library 就是一个例子,但还有其他一些具有更宽松许可的例子。
There are efficient algorithms for least-squares fitting; see Wikipedia for details. There are also libraries that implement the algorithms for you, likely more efficiently than a naive implementation would do; the GNU Scientific Library is one example, but there are others under more lenient licenses as well.
这是我的 C/C++ 函数版本,它执行简单的线性回归。计算遵循有关简单线性回归的维基百科文章。这是在 github 上作为单头公共域 (MIT) 库发布的:simple_linear_regression。该库(.h 文件)经过测试,可以在 Linux 和 Windows 上运行,也可以使用 -Wall -Werror 和 clang/gcc 支持的所有 -std 版本从 C 和 C++ 运行。
使用示例:
Here is my version of a C/C++ function that does simple linear regression. The calculations follow the wikipedia article on simple linear regression. This is published as a single-header public-domain (MIT) library on github: simple_linear_regression. The library (.h file) is tested to work on Linux and Windows, and from C and C++ using -Wall -Werror and all -std versions supported by clang/gcc.
Usage example:
来自数值食谱:科学计算的艺术中的(15.2)将数据拟合为直线:
线性回归:
下面的结构执行上述计算:
其中
struct Gamma
:和
stuct Gauleg18
:以及最后的 fuinction
Gamma::invgamp()
:From Numerical Recipes: The Art of Scientific Computing in (15.2) Fitting Data to a Straight Line:
Linear Regression:
The below struct performs the mentioned calculations:
where
struct Gamma
:and
stuct Gauleg18
:and, finally fuinction
Gamma::invgamp()
:上面的原始示例对我来说在斜率和偏移方面效果很好,但我在校正系数方面遇到了困难。也许我的括号的作用与假定的优先级不同?不管怎样,在其他网页的帮助下,我终于得到了与 Excel 中的线性趋势线相匹配的值。我想我会使用 Mark Lakata 的变量名来分享我的代码。希望这有帮助。
The original example above worked well for me with slope and offset but I had a hard time with the corr coef. Maybe I don't have my parenthesis working the same as the assumed precedence? Anyway, with some help of other web pages I finally got values that match the linear trend-line in Excel. Thought I would share my code using Mark Lakata's variable names. Hope this helps.
作为一项作业,我必须使用 RMSE 损失函数用 C 语言编写一个简单的线性回归。该程序是动态的,您可以输入自己的值并选择自己的损失函数,目前仅限于均方根误差。但首先是我使用的算法:
现在代码...你需要gnuplot来显示图表,sudo apt install gnuplot
as an assignment I had to code in C a simple linear regression using RMSE loss function. The program is dynamic and you can enter your own values and choose your own loss function which is for now limited to Root Mean Square Error. But first here are the algorithms I used:
now the code... you need gnuplot to display the chart, sudo apt install gnuplot
请参阅本文的第 1 节。本节将二维线性回归表示为矩阵乘法练习。只要您的数据表现良好,这种技术就应该允许您开发快速的最小二乘拟合。
根据数据的大小,可能值得将矩阵乘法从代数角度简化为简单的方程组,从而避免编写 matmult() 函数。 (预先警告,对于超过 4 或 5 个数据点,这是完全不切实际的!)
Look at Section 1 of this paper. This section expresses a 2D linear regression as a matrix multiplication exercise. As long as your data is well-behaved, this technique should permit you to develop a quick least squares fit.
Depending on the size of your data, it might be worthwhile to algebraically reduce the matrix multiplication to simple set of equations, thereby avoiding the need to write a matmult() function. (Be forewarned, this is completely impractical for more than 4 or 5 data points!)
据我所知,求解最小二乘法最快、最有效的方法是从参数向量中减去(梯度)/(二阶梯度)。 (二阶梯度 = 即 Hessian 矩阵的对角线。)
直觉如下:
假设您想要优化单个参数的最小二乘法。这相当于找到抛物线的顶点。然后,对于任意随机初始参数 x0,损失函数的顶点位于 x0 - f(1) / f (2)。这是因为将 - f(1) / f(2) 添加到 x 总是会将导数 f(1) 归零。
旁注:在 Tensorflow 中实现这一点,解决方案出现在 w0 - f(1) / f(2) / (权重数),但我不确定这是由于 Tensorflow 还是其他原因造成的。
The fastest, most efficient way to solve least squares, as far as I am aware, is to subtract (the gradient)/(the 2nd order gradient) from your parameter vector. (2nd order gradient = i.e. the diagonal of the Hessian.)
Here is the intuition:
Let's say you want to optimize least squares over a single parameter. This is equivalent to finding the vertex of a parabola. Then, for any random initial parameter, x0, the vertex of the loss function is located at x0 - f(1) / f(2). That's because adding - f(1) / f(2) to x will always zero out the derivative, f(1).
Side note: Implementing this in Tensorflow, the solution appeared at w0 - f(1) / f(2) / (number of weights), but I'm not sure if that's due to Tensorflow or if it's due to something else..