R 中选择哪个级别是 lm 回归中因子的基本类别的最佳方法

发布于 2024-12-10 14:31:39 字数 542 浏览 1 评论 0原文

假设我想使用 lm 和 factor 作为右侧变量来运行回归。选择因子中的哪个级别是基本类别（为避免多重共线性而被排除的类别）的最佳方法是什么。请注意，我对排除截距不感兴趣，因为我有很多因素。

我还想要一个基于公式的解决方案，而不是直接作用于 data.frame 的解决方案，尽管如果您认为您对此有一个非常好的解决方案，也请发布它。

我的解决方案是：

base_cat <- function(x) c(x,1:(x-1),(x+1):100) 
a_reg <- lm(y ~ x1 + x2 + factor(x3, levels=base_cat(30)) #suppose that x3 has draws from the integers 1 to 100.

lm 留下的类别是因子中的第一个级别，因此这只是对级别进行重新排序，以便 base_cat() 中指定的类别是第一个级别，并将其余的放在后面。

还有其他想法吗？

原文

Suppose I want to run a regression using lm and a factor as a right hand side variable. What is the best way to choose which level in the factor is the base category (the one that is excluded to avoid multicollinearity). Note that I am not interested in excluding the intercept because I have many factors.

I would also like a formula-based solution, not one that acts on the data.frame directly, although if you think you have a really good solution for that, please post it as well.

My solution is:

base_cat <- function(x) c(x,1:(x-1),(x+1):100) 
a_reg <- lm(y ~ x1 + x2 + factor(x3, levels=base_cat(30)) #suppose that x3 has draws from the integers 1 to 100.

The left out category by lm is the first level in the factor so this just reorders the levels so that the one specified in base_cat() is the first one, and puts the rest after.

Any other ideas?

分享到QQ

分享到微博