如何创建简单的梯度下降算法

发布于 2024-09-25 20:12:31 字数 3542 浏览 2 评论 0原文

我正在研究简单的机器学习算法，从简单的梯度下降开始，但在尝试用 python 实现它时遇到了一些麻烦。

这是我试图重现的示例，我获得了有关具有（居住面积（以英尺 2 为单位）和卧室数量）的房屋的数据及其结果价格：

居住面积（英尺 2）：2104

#bedrooms：3

价格(1000$s) : 400

我正在尝试使用梯度下降法进行简单回归，但我的算法不起作用... 该算法的形式并不是故意使用向量（我正在尝试逐步理解它）。

i = 1
import sys
derror=sys.maxint
error = 0
step = 0.0001
dthresh = 0.1
import random

theta1 = random.random()
theta2 = random.random()
theta0 = random.random()
while derror>dthresh:
    diff = 400 - theta0 - 2104 * theta1 - 3 * theta2
    theta0 = theta0 + step * diff * 1
    theta1 = theta1 + step * diff * 2104
    theta2 = theta2 + step * diff * 3
    hserror = diff**2/2
    derror = abs(error - hserror)
    error = hserror
    print 'iteration : %d, error : %s' % (i, error)
    i+=1

我理解数学，我正在构建一个预测函数 $h_{\theta}(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2$ 与和是变量（居住面积、卧室数量）和 $h_{\theta}(x)$ 预计价格。

我正在使用成本函数 ()（占一分）： $hserror = \frac{1}{2} (h_{\theta}(x) - y)^2$ 这是一个常见的问题，但我更像是一名软件工程师，我正在一步一步地学习，你能告诉我出了什么问题吗？

我用这段代码得到了它：

data = {(2104, 3) : 400, (1600,3) : 330, (2400, 3) : 369, (1416, 2) : 232, (3000, 4) : 540}
for x in range(10):
    i = 1
    import sys
    derror=sys.maxint
    error = 0
    step = 0.00000001
    dthresh = 0.0000000001
    import random

    theta1 = random.random()*100
    theta2 = random.random()*100
    theta0 = random.random()*100
    while derror>dthresh:
        diff = 400 - (theta0 + 2104 * theta1 + 3 * theta2)
        theta0 = theta0 + step * diff * 1
        theta1 = theta1 + step * diff * 2104
        theta2 = theta2 + step * diff * 3
        hserror = diff**2/2
        derror = abs(error - hserror)
        error = hserror
        #print 'iteration : %d, error : %s, derror : %s' % (i, error, derror)
        i+=1
    print ' theta0 : %f, theta1 : %f, theta2 : %f' % (theta0, theta1, theta2)
    print ' done : %f' %(theta0 + 2104 * theta1 + 3*theta2)

最终得到如下答案：

 theta0 : 48.412337, theta1 : 0.094492, theta2 : 50.925579
 done : 400.000043
 theta0 : 0.574007, theta1 : 0.185363, theta2 : 3.140553
 done : 400.000042
 theta0 : 28.588457, theta1 : 0.041746, theta2 : 94.525769
 done : 400.000043
 theta0 : 42.240593, theta1 : 0.096398, theta2 : 51.645989
 done : 400.000043
 theta0 : 98.452431, theta1 : 0.136432, theta2 : 4.831866
 done : 400.000043
 theta0 : 18.022160, theta1 : 0.148059, theta2 : 23.487524
 done : 400.000043
 theta0 : 39.461977, theta1 : 0.097899, theta2 : 51.519412
 done : 400.000042
 theta0 : 40.979868, theta1 : 0.040312, theta2 : 91.401406
 done : 400.000043
 theta0 : 15.466259, theta1 : 0.111276, theta2 : 50.136221
 done : 400.000043
 theta0 : 72.380926, theta1 : 0.013814, theta2 : 99.517853
 done : 400.000043

原文

I'm studying simple machine learning algorithms, beginning with a simple gradient descent, but I've got some trouble trying to implement it in python.

Here is the example I'm trying to reproduce, I've got data about houses with the (living area (in feet2), and number of bedrooms) with the resulting price :

Living area (feet2) : 2104

#bedrooms : 3

Price (1000$s) : 400

I'm trying to do a simple regression using the gradient descent method, but my algorithm won't work...
The form of the algorithm is not using vectors on purpose (I'm trying to understand it step by step).

i = 1
import sys
derror=sys.maxint
error = 0
step = 0.0001
dthresh = 0.1
import random

theta1 = random.random()
theta2 = random.random()
theta0 = random.random()
while derror>dthresh:
    diff = 400 - theta0 - 2104 * theta1 - 3 * theta2
    theta0 = theta0 + step * diff * 1
    theta1 = theta1 + step * diff * 2104
    theta2 = theta2 + step * diff * 3
    hserror = diff**2/2
    derror = abs(error - hserror)
    error = hserror
    print 'iteration : %d, error : %s' % (i, error)
    i+=1

I understand the math, I'm constructing a predicting function
$h_{\theta}(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2$

with
and
being the variables (living area, number of bedrooms) and $h_{\theta}(x)$
the estimated price.

I'm using the cost function (
) (for one point) :
$hserror = \frac{1}{2} (h_{\theta}(x) - y)^2$

This is a usual problem, but I'm more of a software engineer and I'm learning one step at a time, can you tell me what's wrong ?

I got it working with this code :

data = {(2104, 3) : 400, (1600,3) : 330, (2400, 3) : 369, (1416, 2) : 232, (3000, 4) : 540}
for x in range(10):
    i = 1
    import sys
    derror=sys.maxint
    error = 0
    step = 0.00000001
    dthresh = 0.0000000001
    import random

    theta1 = random.random()*100
    theta2 = random.random()*100
    theta0 = random.random()*100
    while derror>dthresh:
        diff = 400 - (theta0 + 2104 * theta1 + 3 * theta2)
        theta0 = theta0 + step * diff * 1
        theta1 = theta1 + step * diff * 2104
        theta2 = theta2 + step * diff * 3
        hserror = diff**2/2
        derror = abs(error - hserror)
        error = hserror
        #print 'iteration : %d, error : %s, derror : %s' % (i, error, derror)
        i+=1
    print ' theta0 : %f, theta1 : %f, theta2 : %f' % (theta0, theta1, theta2)
    print ' done : %f' %(theta0 + 2104 * theta1 + 3*theta2)

which ends up with answers like this :

 theta0 : 48.412337, theta1 : 0.094492, theta2 : 50.925579
 done : 400.000043
 theta0 : 0.574007, theta1 : 0.185363, theta2 : 3.140553
 done : 400.000042
 theta0 : 28.588457, theta1 : 0.041746, theta2 : 94.525769
 done : 400.000043
 theta0 : 42.240593, theta1 : 0.096398, theta2 : 51.645989
 done : 400.000043
 theta0 : 98.452431, theta1 : 0.136432, theta2 : 4.831866
 done : 400.000043
 theta0 : 18.022160, theta1 : 0.148059, theta2 : 23.487524
 done : 400.000043
 theta0 : 39.461977, theta1 : 0.097899, theta2 : 51.519412
 done : 400.000042
 theta0 : 40.979868, theta1 : 0.040312, theta2 : 91.401406
 done : 400.000043
 theta0 : 15.466259, theta1 : 0.111276, theta2 : 50.136221
 done : 400.000043
 theta0 : 72.380926, theta1 : 0.013814, theta2 : 99.517853
 done : 400.000043

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

踏雪无痕 2024-10-02 20:12:31

第一个问题是，仅使用一条数据运行它会给您一个不确定的系统......这意味着它可能有无限多个解决方案。对于三个变量，您预计至少有 3 个数据点，最好是更高的数据点。

其次，使用梯度下降，其中步长是梯度的缩放版本，除了在解决方案的小邻域之外，不能保证收敛。您可以通过切换到负梯度方向上的固定大小步长（慢）或负梯度方向上的线性搜索（更快，但稍微复杂一些）来解决这个问题，

所以对于固定步长而不是

theta0 = theta0 - step * dEdtheta0
theta1 = theta1 - step * dEdtheta1
theta2 = theta2 - step * dEdtheta2

您这样做

n = max( [ dEdtheta1, dEdtheta1, dEdtheta2 ] )    
theta0 = theta0 - step * dEdtheta0 / n
theta1 = theta1 - step * dEdtheta1 / n
theta2 = theta2 - step * dEdtheta2 / n

您的步骤中似乎也可能存在符号错误。

我也不确定恐怖是一个好的停止标准。（但是众所周知，停止标准很难“正确”）

我的最后一点是，梯度下降对于参数拟合来说非常慢。您可能想改用共轭梯度或 Levenberg-Marquadt 方法。我怀疑这两种方法已经存在于 python 的 numpy 或 scipy 包中（默认情况下它们不是 python 的一部分，但很容易安装）

First issue is that running this with only one piece of data gives you an underdetermined system... this means it may have an infinite number of solutions. With three variables, you'd expect to have at least 3 data points, preferably much higher.

Secondly using gradient descent where the step size is a scaled version of the gradient is not guaranteed to converge except in a small neighbourhood of the solution. You can fix that by switching to either a fixed size step in the direction of the negative gradient (slow) or a linesearch in the direction of the negative gradient ( faster, but slightly more complicated)

So for fixed step size instead of

theta0 = theta0 - step * dEdtheta0
theta1 = theta1 - step * dEdtheta1
theta2 = theta2 - step * dEdtheta2

You do this

n = max( [ dEdtheta1, dEdtheta1, dEdtheta2 ] )    
theta0 = theta0 - step * dEdtheta0 / n
theta1 = theta1 - step * dEdtheta1 / n
theta2 = theta2 - step * dEdtheta2 / n

It also looks like you may have a sign error in your steps.

I'm also not sure that derror is a good stopping criteria. (But stopping criteria are notoriously hard to get "right")

My final point is that gradient descent is horribly slow for parameter fitting. You probably want to use conjugate-gradient or Levenberg-Marquadt methods instead. I suspect that both of these methods already exist for python in the numpy or scipy packages (which aren't part of python by default but are pretty easy to install)

回复收藏 0 原文

~没有更多了~