如何处理缺少值的线性优化问题?

发布于 2025-02-13 18:34:00 字数 1044 浏览 2 评论 0原文

让我们考虑以下示例代码:

rng('default')

% creating fake data
data = randi([-1000 +1000],30,500);
yt = randi([-1000 1000],30,1);

% creating fake missing values
row = randi([1 15],1,500);
col = rand(1,500) < .5;

% imputing missing fake values
for i = 1:500
    if col(i) == 1
        data(1:row(i),i) = nan;
    end
end

%% here starts my problem
wgts = ones(1,500); % optimal weights needs to be binary (only zero or one)

% this would be easy with matrix formulas but I have missing values at the
% beginning of the series
for j = 1:30
    xt(j,:) = sum(data(j,:) .* wgts,2,'omitnan');
end


X = [xt(3:end) xt(2:end-1) xt(1:end-2)];
y = yt(3:end);

% from here I basically need to:
% maximize the Adjusted R squared of the regression fitlm(X,y)
% by changing wgts
% subject to wgts = 1 or wgts = 0
% and optionally to impose sum(wgts,'all') = some number;

% basically I need to select the data cols with the highest explanatory
% power, omitting missing data

使用Excel Solver相对易于实现,但是它只能处理200个决策变量,并且需要大量时间。先感谢您。

Let's consider this example code:

rng('default')

% creating fake data
data = randi([-1000 +1000],30,500);
yt = randi([-1000 1000],30,1);

% creating fake missing values
row = randi([1 15],1,500);
col = rand(1,500) < .5;

% imputing missing fake values
for i = 1:500
    if col(i) == 1
        data(1:row(i),i) = nan;
    end
end

%% here starts my problem
wgts = ones(1,500); % optimal weights needs to be binary (only zero or one)

% this would be easy with matrix formulas but I have missing values at the
% beginning of the series
for j = 1:30
    xt(j,:) = sum(data(j,:) .* wgts,2,'omitnan');
end


X = [xt(3:end) xt(2:end-1) xt(1:end-2)];
y = yt(3:end);

% from here I basically need to:
% maximize the Adjusted R squared of the regression fitlm(X,y)
% by changing wgts
% subject to wgts = 1 or wgts = 0
% and optionally to impose sum(wgts,'all') = some number;

% basically I need to select the data cols with the highest explanatory
% power, omitting missing data

This is relatively easy to implement with Excel solver, but but it can handle only 200 decision variables and it takes a lot of time. Thank you in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

此岸叶落 2025-02-20 18:34:01

lasso 似乎给出了有趣的结果:

% creating fake data (but having an actual relationship between `yt` and the predictors)
rng('default')
data = randi([-1000 +1000],30,500);
alphas = rand(1,500);
yt = sum(alphas.*data,2) + 10*randn(30,1);
plot(yt)

//i.sstatic.net/qeesd.png“ rel =” nofollow noreferrer”> “在此处inter

% Use lasso algorithm with no constant coefficients
% keep the column of coefficients that minimizes MSE.
% By design, lasso minimizes the amount of non zero coefficients

[B,FitInfo] = lasso(data,yt,'Intercept',false);
idxLambda1SE = find(FitInfo.MSE == min(FitInfo.MSE));
coef = B(:,idxLambda1SE);
y_verif = data*coef;
hold on;plot(y_verif)

“在此处输入图像说明”

sum(coef〜 = 0)>

ans =

  29
 

输出仅由29列解释,而alpha中的所有值均为零

lasso seems to give interesting results:

% creating fake data (but having an actual relationship between `yt` and the predictors)
rng('default')
data = randi([-1000 +1000],30,500);
alphas = rand(1,500);
yt = sum(alphas.*data,2) + 10*randn(30,1);
plot(yt)

enter image description here

% Use lasso algorithm with no constant coefficients
% keep the column of coefficients that minimizes MSE.
% By design, lasso minimizes the amount of non zero coefficients

[B,FitInfo] = lasso(data,yt,'Intercept',false);
idxLambda1SE = find(FitInfo.MSE == min(FitInfo.MSE));
coef = B(:,idxLambda1SE);
y_verif = data*coef;
hold on;plot(y_verif)

enter image description here

sum(coef~=0)

ans =

29

Output has been explained by 29 columns only, whereas all the values in alpha were non zero

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文