`lm`摘要不显示所有因素级别
我正在对许多属性进行线性回归,包括两个分类属性,b
和f
,并且我没有每个因子级别的系数值。
B
具有9个级别,f
具有6个级别。当我最初运行模型(带有截距)时,我为b
和5的5个系数和f
的5个系数,我理解为拦截中每个级别的第一级。
我想根据其系数对b
和f
中的级别进行排名我可以获得各个级别的系数。
Call:
lm(formula = dependent ~ a + B-1 + c + d + e + F-1 + g + h, data = input)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
a 2.082e+03 1.026e+02 20.302 < 2e-16 ***
B1 -1.660e+04 9.747e+02 -17.027 < 2e-16 ***
B2 -1.681e+04 9.379e+02 -17.920 < 2e-16 ***
B3 -1.653e+04 9.254e+02 -17.858 < 2e-16 ***
B4 -1.765e+04 9.697e+02 -18.202 < 2e-16 ***
B5 -1.535e+04 1.388e+03 -11.059 < 2e-16 ***
B6 -1.677e+04 9.891e+02 -16.954 < 2e-16 ***
B7 -1.644e+04 9.694e+02 -16.961 < 2e-16 ***
B8 -1.931e+04 9.899e+02 -19.512 < 2e-16 ***
B9 -1.722e+04 9.071e+02 -18.980 < 2e-16 ***
c -6.928e-01 6.977e-01 -0.993 0.321272
d -3.288e-01 2.613e+00 -0.126 0.899933
e -8.384e-01 1.171e+00 -0.716 0.474396
F2 4.679e+02 2.176e+02 2.150 0.032146 *
F3 7.753e+02 2.035e+02 3.810 0.000159 ***
F4 1.885e+02 1.689e+02 1.116 0.265046
F5 5.194e+02 2.264e+02 2.295 0.022246 *
F6 1.365e+03 2.334e+02 5.848 9.94e-09 ***
g 4.278e+00 7.350e+00 0.582 0.560847
h 2.717e-02 5.100e-03 5.328 1.62e-07 ***
这部分工作,导致所有级别的b
的显示,但是f1
仍未显示。由于不再有拦截,所以我很困惑为什么f1
不在线性模型中。
切换呼叫的顺序,以便+ f -1 先于
。+ b -1
导致所有级别的系数f
可见但b1
。
是否有人知道如何显示所有级别b
和f
,或与其他级别的f1
的相对权重与其他级别的f1
相比我有输出?
I am running a linear regression on a number of attributes including two categorical attributes, B
and F
, and I don't get a coefficient value for every factor level I have.
B
has 9 levels and F
has 6 levels. When I initially ran the model (with intercepts), I got 8 coefficients for B
and 5 for F
which I understood as the first level of each being included in the intercept.
I want ranking the levels within B
and F
based on their coefficient so I added -1
after each factor to lock the intercept at 0 so that I could get coefficients for all levels.
Call:
lm(formula = dependent ~ a + B-1 + c + d + e + F-1 + g + h, data = input)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
a 2.082e+03 1.026e+02 20.302 < 2e-16 ***
B1 -1.660e+04 9.747e+02 -17.027 < 2e-16 ***
B2 -1.681e+04 9.379e+02 -17.920 < 2e-16 ***
B3 -1.653e+04 9.254e+02 -17.858 < 2e-16 ***
B4 -1.765e+04 9.697e+02 -18.202 < 2e-16 ***
B5 -1.535e+04 1.388e+03 -11.059 < 2e-16 ***
B6 -1.677e+04 9.891e+02 -16.954 < 2e-16 ***
B7 -1.644e+04 9.694e+02 -16.961 < 2e-16 ***
B8 -1.931e+04 9.899e+02 -19.512 < 2e-16 ***
B9 -1.722e+04 9.071e+02 -18.980 < 2e-16 ***
c -6.928e-01 6.977e-01 -0.993 0.321272
d -3.288e-01 2.613e+00 -0.126 0.899933
e -8.384e-01 1.171e+00 -0.716 0.474396
F2 4.679e+02 2.176e+02 2.150 0.032146 *
F3 7.753e+02 2.035e+02 3.810 0.000159 ***
F4 1.885e+02 1.689e+02 1.116 0.265046
F5 5.194e+02 2.264e+02 2.295 0.022246 *
F6 1.365e+03 2.334e+02 5.848 9.94e-09 ***
g 4.278e+00 7.350e+00 0.582 0.560847
h 2.717e-02 5.100e-03 5.328 1.62e-07 ***
This worked in part, leading to the display of all levels of B
, however F1
is still not displayed. As there is no longer an intercept I am confused why F1
is not in the linear model.
Switching the order of the call so that + F - 1
precedes + B - 1
results in coefficients of all levels of F
being visible but not B1
.
Does anybody know either how to display all levels of both B
and F
, or how to assess the relative weight of F1
compared to other levels of F
from the outputs I have?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这个问题一遍又一遍地提出,但不幸的是,没有做出令人满意的答案,这可能是一个适当的重复目标。看起来我需要写一个。
大多数人都知道这与“对比”有关,但并不是每个人都知道为什么需要它,以及如何理解其结果。我们必须查看模型矩阵才能完全消化。
假设我们对一个有两个因素的模型感兴趣:
〜f + g
(数值协变量无关紧要,所以我不包含它们;响应不会出现在模型矩阵中,所以也将其丢弃)) 。考虑以下可重现的示例:我们从一个模型矩阵开始,根本没有对比:
请注意,我们有:
so
span {f1,f2,f3} = span {g1,g2,g3} }
。 在此完整规范中,不可识别2列。x0
将具有列级1 + 3 + 3-2 = 5
:因此,如果我们使用此
x0 ,7个参数中的2个系数将为
na
:这真正意味着的是,我们必须在7个参数上添加2个线性约束,以获取完整的等级模型。这两个约束是什么并不重要,但是必须有2个线性独立的约束
f1
,f2
和f3
sum sum to 0的系数,而对于0G1
,g2
和g3
。f
和g
中。请注意,这三种方式最终以三种不同的解决方案:
前两个仍处于固定效应建模的范围。通过“对比”,我们减少了参数数量,直到获得完整的等级模型矩阵为止。虽然另外两个并不能减少参数的数量,而是有效地降低了有效的自由度。
现在,您肯定是按照“对比”方式。因此,请记住,我们必须删除2列。它们可以是
f
的一列,也可以是g
的一列,给予模型〜f + g
,f
代码>和g
对比;f
或g
的一列,给予模型〜f + g -1
。现在,您应该清楚地表明,在删除列的框架内,您无法获得想要的东西,因为您期望仅掉落1列。最终的模型矩阵仍将缺乏等级。
如果您真的想在那里拥有所有系数,请使用受约束的最小二乘或惩罚回归 /线性混合模型。
现在,当我们有各种因素相互作用时,情况就会更加复杂,但是这个想法仍然相同。但是鉴于我的答案已经足够长,我不想继续。
This issue is raised over and over again, but unfortunately no satisfying answer has been made which can be an appropriate duplicate target. Looks like I need to write one.
Most people know this is related to "contrasts", but not everyone knows why it is needed, and how to understand its result. We have to look at model matrix in order to fully digest this.
Suppose we are interested in a model with two factors:
~ f + g
(numerical covariates do not matter so I include none of them; the response does not appear in model matrix, so drop it, too). Consider the following reproducible example:We start with a model matrix with no contrasts at all:
Note, we have:
So
span{f1, f2, f3} = span{g1, g2, g3} = span{(Intercept)}
. In this full specification, 2 columns are not identifiable.X0
will have column rank1 + 3 + 3 - 2 = 5
:So, if we fit a linear model with this
X0
, 2 coefficients out of 7 parameters will beNA
:What this really implies, is that we have to add 2 linear constraints on 7 parameters, in order to get a full rank model. It does not really matter what these 2 constraints are, but there must be 2 linearly independent constrains. For example, we can do either of the following:
X0
;f1
,f2
andf3
sum to 0, and the same forg1
,g2
andg3
.f
andg
.Note, these three ways end up with three different solutions:
The first two are still in the scope of fixed effect modelling. By "contrasts", we reduce the number of parameters until we get a full rank model matrix; while the other two does not reduce the number of parameters, but effectively reduces the effective degree of freedom.
Now, you are certainly after the "contrasts" way. So, remember, we have to drop 2 columns. They can be
f
and one column fromg
, giving to a model~ f + g
, withf
andg
contrasted;f
org
, giving to a model~ f + g - 1
.Now you should be clear, that within the framework of dropping columns, there is no way you can get what you want, because you are expecting to drop only 1 column. The resulting model matrix will still be rank-deficient.
If you really want to have all coefficients there, use constrained least squares, or penalized regression / linear mixed models.
Now, when we have interaction of factors, things are more complicated but the idea is still the same. But given that my answer is already long enough, I don't want to continue.