R 中选择哪个级别是 lm 回归中因子的基本类别的最佳方法
假设我想使用 lm
和 factor
作为右侧变量来运行回归。选择因子中的哪个级别是基本类别(为避免多重共线性而被排除的类别)的最佳方法是什么。请注意,我对排除截距不感兴趣,因为我有很多因素。
我还想要一个基于公式的解决方案,而不是直接作用于 data.frame 的解决方案,尽管如果您认为您对此有一个非常好的解决方案,也请发布它。
我的解决方案是:
base_cat <- function(x) c(x,1:(x-1),(x+1):100)
a_reg <- lm(y ~ x1 + x2 + factor(x3, levels=base_cat(30)) #suppose that x3 has draws from the integers 1 to 100.
lm
留下的类别是因子中的第一个级别,因此这只是对级别进行重新排序,以便 base_cat()
中指定的类别是第一个级别,并将其余的放在后面。
还有其他想法吗?
Suppose I want to run a regression using lm
and a factor
as a right hand side variable. What is the best way to choose which level in the factor is the base category (the one that is excluded to avoid multicollinearity). Note that I am not interested in excluding the intercept because I have many factors.
I would also like a formula-based solution, not one that acts on the data.frame directly, although if you think you have a really good solution for that, please post it as well.
My solution is:
base_cat <- function(x) c(x,1:(x-1),(x+1):100)
a_reg <- lm(y ~ x1 + x2 + factor(x3, levels=base_cat(30)) #suppose that x3 has draws from the integers 1 to 100.
The left out category by lm
is the first level in the factor so this just reorders the levels so that the one specified in base_cat()
is the first one, and puts the rest after.
Any other ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
函数
relevel
正是这样做的。您向其传递一个无序因子和参考级别的名称,它会返回一个将该级别作为第一个因子的因子。The function
relevel
does precisely this. You pass it an unordered factor and the name of the reference level and it returns a factor with that level as the first one.