我可以强制三向线性回归中的两个分量为正吗?

发布于 2024-07-14 15:50:06 字数 566 浏览 10 评论 0原文

如果我没有使用正确的数学术语,我很抱歉,但我希望您能理解我想要完成的任务。

我的问题: 我对两个向量 x 和 y 的值与结果 z 使用线性回归(当前为最小二乘法)。 这是在 matlab 中完成的,我使用 \-运算符来执行回归。 我的数据集将包含几千个观测值(最多大约 50000 个)。

x 值将在 10-300 范围内(大多数在 60 到 100 之间),y 值将在 1-3 范围内。

我的代码如下所示:

X = [ones(size(x,1) x y];
parameters = X\y;

输出“参数”是此公式中使用的三个因子 a0、a1 和 a2:

a0 * 1 + a1 * xi + a2 * yi = zi

(i 应该带有下标)

这按预期工作,尽管我想要两个参数 a1 和a2 始终为正值,即使向量 z 为负值(当然,这意味着 a0 将为负值),因为这就是真实模型的样子(z 始终与 x 和 z 正相关)。 使用最小二乘法可以吗? 我也对其他线性回归算法持开放态度。

I'm sorry if I'm not using the correct mathemathical terms, but I hope you'll understand what I'm trying to accomplish.

My problem:
I'm using linear regression (currently least squares method) on the values from two vectors x and y against the result z. This is to be done in matlab, and I'm using the \-operator to perform the regression. My dataset will contain a few thousand observations (up to about 50000 at max).

The x-values will be in the area of 10-300 (most between 60 and 100) and the y-values in the 1-3 area.

My code looks like this:

X = [ones(size(x,1) x y];
parameters = X\y;

The output "parameters" are then the three factors a0, a1 and a2 which is used in this formula:

a0 * 1 + a1 * xi + a2 * yi = zi

(The i's are supposed to be subscripted)

This works like expected, although I want the two parameters a1 and a2 to ALWAYS be positive values, even when the vector z is negative (this means that the a0 will be negative, of course), since this is what the real model looks like (z is always positively correlated to x and z). Is this possible using the least squares method? I'm also open for other algorithms for linear regression.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

悲喜皆因你 2024-07-21 15:50:06

让我尝试重新措辞来澄清。 根据您的模型,z 始终与 x 和 y 正相关。 然而,有时当您求解系数的线性回归时,这会给您一个负值。

如果您对数据的看法是正确的,那么只有当正确的系数很小时,并且噪声恰好将其取为负数时,才会发生这种情况。 您可以将其分配为零,但平均值将无法正确匹配。

在这种情况下,正确的解决方案如 jpalacek 所说,但在此进行了更详细的解释:

  1. 尝试对 x 和 y 进行回归。 如果两者均为阳性,则取结果。
  2. 如果 a1 为负数,则假设它应该为零。 对 y 进行 z 回归。 如果 a2 为正,则将 a1 视为 0,并从该回归中取 a0 和 a2。
  3. 如果 a2 为负数,则假设它也应该为零。 将 z 对 1 进行回归,并将其作为 a0。 让 a1 和 a2 为 0。

这应该会给你你想要的。

Let me try and rephrase to clarify. Accoring to your model z is always positively correlated with x and y. However, sometimes when you solve the linear regression for the coefficient this gives you a negative value.

If you are right about the data, this should only happen when the correct coefficient is small, and noise happens to take it negative. You could just assign it to zero, but then the means wouldn't match properly.

In which case the correct solution is as jpalacek says, but explained with more detail here:

  1. Try and regress against x and y. If both positive take the result.
  2. If a1 is negative, assume it should be zero. regress z against y. If a2 is positive then take a1 as 0, and a0 and a2 from this regression.
  3. If a2 is negative, assume it should be zero too. Regress z against 1, and take this as a0. Let a1 and a2 be 0.

This should give you what you want.

请你别敷衍 2024-07-21 15:50:06

简单的解决方案是使用专门用于解决该问题的工具。 即,使用优化工具箱中的 lsqlin。 为三个参数中的两个设置下界约束。

因此,假设 x、y 和 z 都是 COLUMN 向量,

A = [ones(length(x),1),x,y];

磅 = [-inf, 0, 0];

a = lsqlin(A,z,[],[],[],[],lb);

这将仅约束第二个和第三个未知参数。

如果没有优化工具箱,请使用 lsqnonneg,它是 matlab 本身的一部分。 这里的解决方案也很简单。

A = [个数(长度(x),1),x,y];

a = lsqnonneg(A,z);

您的模型将是

z = a(1) + a(2)*x + a(3)*y

如果 a(1) 本质上为零,即它在零的容差范围内,则假设第一个参数是受零边界约束。 在这种情况下,通过更改 A 中 1 列上的符号来解决第二个问题。

A(:,1) = -1;

a = lsqnonneg(A,z);

如果该解的 a(1) 明显非零,则第二个解一定优于第一个解。 您的模型现在将是

z = -a(1) + a(2)*x + a(3)*y

它最多花费您两次对 lsqnonneg 的调用,而第二次调用仅进行了一小部分(缺少任何信息)关于你的问题,几率是第二次通话的 50%)。

The simple solution is to use a tool designed to solve it. That is, use lsqlin, from the optimization toolbox. Set a lower bound constraint for two of the three parameters.

Thus, assuming x, y, and z are all COLUMN vectors,

A = [ones(length(x),1),x,y];

lb = [-inf, 0, 0];

a = lsqlin(A,z,[],[],[],[],lb);

This will constrain only the second and third unknown parameters.

Without the optimization toolbox, use lsqnonneg, which is part of matlab itself. Here too the solution is easy enough.

A = [ones(length(x),1),x,y];

a = lsqnonneg(A,z);

Your model will be

z = a(1) + a(2)*x + a(3)*y

If a(1) is essentially zero, i.e., it is within a tolerance of zero, then assume that the first parameter was constrained by the bound at zero. In that case, solve a second problem by changing the sign on the column of ones in A.

A(:,1) = -1;

a = lsqnonneg(A,z);

If this solution has a(1) significantly non-zero, then the second solution must be better than the first. Your model will now be

z = -a(1) + a(2)*x + a(3)*y

It costs you at most two calls to lsqnonneg, and the second call is only ever made some fraction (lacking any information about your problem, the odds are 50% of the second call) of the time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文