随机梯度下降实现 - MATLAB

发布于 2024-10-19 03:08:25 字数 1583 浏览 2 评论 0原文

我正在尝试在 MATLAB 中实现“随机梯度下降”。我完全遵循了该算法,但我得到了一个非常非常大的预测/拟合函数 w(系数)。我的算法有错误吗?

算法: 在此处输入图像描述

x = 0:0.1:2*pi      // X-axis
    n = size(x,2);      
    r = -0.2+(0.4).*rand(n,1);  //generating random noise to be added to the sin(x) function

    t=zeros(1,n);
    y=zeros(1,n);



    for i=1:n
        t(i)=sin(x(i))+r(i);          // adding the noise
        y(i)=sin(x(i));               // the function without noise
    end

    f = round(1+rand(20,1)*n);        //generating random indexes

    h = x(f);                         //choosing random x points
    k = t(f);                         //chossing random y points

    m=size(h,2);                     // length of the h vector

    scatter(h,k,'Red');              // drawing the training points (with noise)
    %scatter(x,t,2);
    hold on;
    plot(x,sin(x));                 // plotting the Sin function


    w = [0.3 1 0.5];                    // starting point of w
    a=0.05;                         // learning rate "alpha"

// ---------------- ALGORITHM ---------------------//
    for i=1:20
        v = [1 h(i) h(i).^2];                      // X vector
        e = ((w*v') - k(i)).*v;            // prediction - observation
        w = w - a*e;                       // updating w
    end

    hold on;

    l = 0:1:6;
    g = w(1)+w(2)*l+w(3)*(l.^2);
    plot(l,g,'Yellow');                      // drawing the prediction function

I'm trying to implement "Stochastic gradient descent" in MATLAB. I followed the algorithm exactly but I'm getting a VERY VERY large w (coffients) for the prediction/fitting function. Do I have a mistake in the algorithm ?

The Algorithm :
enter image description here

x = 0:0.1:2*pi      // X-axis
    n = size(x,2);      
    r = -0.2+(0.4).*rand(n,1);  //generating random noise to be added to the sin(x) function

    t=zeros(1,n);
    y=zeros(1,n);



    for i=1:n
        t(i)=sin(x(i))+r(i);          // adding the noise
        y(i)=sin(x(i));               // the function without noise
    end

    f = round(1+rand(20,1)*n);        //generating random indexes

    h = x(f);                         //choosing random x points
    k = t(f);                         //chossing random y points

    m=size(h,2);                     // length of the h vector

    scatter(h,k,'Red');              // drawing the training points (with noise)
    %scatter(x,t,2);
    hold on;
    plot(x,sin(x));                 // plotting the Sin function


    w = [0.3 1 0.5];                    // starting point of w
    a=0.05;                         // learning rate "alpha"

// ---------------- ALGORITHM ---------------------//
    for i=1:20
        v = [1 h(i) h(i).^2];                      // X vector
        e = ((w*v') - k(i)).*v;            // prediction - observation
        w = w - a*e;                       // updating w
    end

    hold on;

    l = 0:1:6;
    g = w(1)+w(2)*l+w(3)*(l.^2);
    plot(l,g,'Yellow');                      // drawing the prediction function

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

拥抱影子 2024-10-26 03:08:25

如果你使用太大的学习率,SGD 可能会发散。
学习率应该收敛到零。

If you use too big learning rate, SGD is likely to diverge.
The learing rate should converge to zero.

霓裳挽歌倾城醉 2024-10-26 03:08:25

通常,如果 w 最终得到太大的值,则存在过度拟合。我没有仔细看你的代码。但我认为,您的代码中缺少的是适当的正则化项,它可以防止训练过度拟合。另外,这里:

e = ((w*v') - k(i)).*v;

这里的v不是预测值的梯度,不是吗?根据算法,你应该替换它。让我们看看这样做之后会是什么样子。

typically, if w ended up with too large values, there is overfitting. I didn't really look at your code carefully. But I think, what is missing from your code is a proper regularization term, which prevents the training overfitting. Also, here:

e = ((w*v') - k(i)).*v;

The v here is not the gradient of the predicted value, isn't it? According to algorithm, you should replace it. Let's see how it will be like after doing this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文