随机梯度下降实现 - MATLAB
我正在尝试在 MATLAB 中实现“随机梯度下降”。我完全遵循了该算法,但我得到了一个非常非常大的预测/拟合函数 w(系数)。我的算法有错误吗?
算法:
x = 0:0.1:2*pi // X-axis
n = size(x,2);
r = -0.2+(0.4).*rand(n,1); //generating random noise to be added to the sin(x) function
t=zeros(1,n);
y=zeros(1,n);
for i=1:n
t(i)=sin(x(i))+r(i); // adding the noise
y(i)=sin(x(i)); // the function without noise
end
f = round(1+rand(20,1)*n); //generating random indexes
h = x(f); //choosing random x points
k = t(f); //chossing random y points
m=size(h,2); // length of the h vector
scatter(h,k,'Red'); // drawing the training points (with noise)
%scatter(x,t,2);
hold on;
plot(x,sin(x)); // plotting the Sin function
w = [0.3 1 0.5]; // starting point of w
a=0.05; // learning rate "alpha"
// ---------------- ALGORITHM ---------------------//
for i=1:20
v = [1 h(i) h(i).^2]; // X vector
e = ((w*v') - k(i)).*v; // prediction - observation
w = w - a*e; // updating w
end
hold on;
l = 0:1:6;
g = w(1)+w(2)*l+w(3)*(l.^2);
plot(l,g,'Yellow'); // drawing the prediction function
I'm trying to implement "Stochastic gradient descent" in MATLAB. I followed the algorithm exactly but I'm getting a VERY VERY large w (coffients) for the prediction/fitting function. Do I have a mistake in the algorithm ?
The Algorithm :
x = 0:0.1:2*pi // X-axis
n = size(x,2);
r = -0.2+(0.4).*rand(n,1); //generating random noise to be added to the sin(x) function
t=zeros(1,n);
y=zeros(1,n);
for i=1:n
t(i)=sin(x(i))+r(i); // adding the noise
y(i)=sin(x(i)); // the function without noise
end
f = round(1+rand(20,1)*n); //generating random indexes
h = x(f); //choosing random x points
k = t(f); //chossing random y points
m=size(h,2); // length of the h vector
scatter(h,k,'Red'); // drawing the training points (with noise)
%scatter(x,t,2);
hold on;
plot(x,sin(x)); // plotting the Sin function
w = [0.3 1 0.5]; // starting point of w
a=0.05; // learning rate "alpha"
// ---------------- ALGORITHM ---------------------//
for i=1:20
v = [1 h(i) h(i).^2]; // X vector
e = ((w*v') - k(i)).*v; // prediction - observation
w = w - a*e; // updating w
end
hold on;
l = 0:1:6;
g = w(1)+w(2)*l+w(3)*(l.^2);
plot(l,g,'Yellow'); // drawing the prediction function
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果你使用太大的学习率,SGD 可能会发散。
学习率应该收敛到零。
If you use too big learning rate, SGD is likely to diverge.
The learing rate should converge to zero.
通常,如果 w 最终得到太大的值,则存在过度拟合。我没有仔细看你的代码。但我认为,您的代码中缺少的是适当的正则化项,它可以防止训练过度拟合。另外,这里:
这里的v不是预测值的梯度,不是吗?根据算法,你应该替换它。让我们看看这样做之后会是什么样子。
typically, if w ended up with too large values, there is overfitting. I didn't really look at your code carefully. But I think, what is missing from your code is a proper regularization term, which prevents the training overfitting. Also, here:
The v here is not the gradient of the predicted value, isn't it? According to algorithm, you should replace it. Let's see how it will be like after doing this.