如何强制 R 使用指定的因子水平作为回归中的参考?

发布于 2024-09-26 11:33:23 字数 176 浏览 10 评论 0原文

如果我在回归中使用二元解释变量,如何告诉 R 使用某个水平作为参考?

它只是默认使用某个级别。

lm(x ~ y + as.factor(b)) 

b {0, 1, 2, 3, 4}。假设我想使用 3 而不是 R 使用的零。

How can I tell R to use a certain level as reference if I use binary explanatory variables in a regression?

It's just using some level by default.

lm(x ~ y + as.factor(b)) 

with b {0, 1, 2, 3, 4}. Let's say I want to use 3 instead of the zero that is used by R.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

墨落成白 2024-10-03 11:33:23

请参阅relevel() 函数。下面是一个示例:

set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
                 y = 4 + (1.5*x) + rnorm(100, sd = 2),
                 b = gl(5, 20))
head(DF)
str(DF)

m1 <- lm(y ~ x + b, data = DF)
summary(m1)

现在使用 relevel() 函数更改 DF 中的因子 b

DF <- within(DF, b <- relevel(b, ref = 3))
m2 <- lm(y ~ x + b, data = DF)
summary(m2)

模型估计了不同的参考水平。

> coef(m1)
(Intercept)           x          b2          b3          b4          b5 
  3.2903239   1.4358520   0.6296896   0.3698343   1.0357633   0.4666219 
> coef(m2)
(Intercept)           x          b1          b2          b4          b5 
 3.66015826  1.43585196 -0.36983433  0.25985529  0.66592898  0.09678759

See the relevel() function. Here is an example:

set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
                 y = 4 + (1.5*x) + rnorm(100, sd = 2),
                 b = gl(5, 20))
head(DF)
str(DF)

m1 <- lm(y ~ x + b, data = DF)
summary(m1)

Now alter the factor b in DF by use of the relevel() function:

DF <- within(DF, b <- relevel(b, ref = 3))
m2 <- lm(y ~ x + b, data = DF)
summary(m2)

The models have estimated different reference levels.

> coef(m1)
(Intercept)           x          b2          b3          b4          b5 
  3.2903239   1.4358520   0.6296896   0.3698343   1.0357633   0.4666219 
> coef(m2)
(Intercept)           x          b1          b2          b4          b5 
 3.66015826  1.43585196 -0.36983433  0.25985529  0.66592898  0.09678759
﹏雨一样淡蓝的深情 2024-10-03 11:33:23

我知道这是一个老问题,但我遇到了类似的问题并发现:

lm(x ~ y + relevel(b, ref = "3")) 

完全按照您的要求进行。

I know this is an old question, but I had a similar issue and found that:

lm(x ~ y + relevel(b, ref = "3")) 

does exactly what you asked.

‘画卷フ 2024-10-03 11:33:23

其他人提到了relevel命令,如果您想更改数据所有分析的基本级别(或者愿意接受更改数据),这是最好的解决方案。

如果您不想更改数据(这是一次性更改,但将来您希望再次使用默认行为),那么您可以使用 C 的组合(注意大写)设置对比度的函数和带有基本参数的 contr.treatments 函数,用于选择要作为基线的级别。

例如:

lm( Sepal.Width ~ C(Species,contr.treatment(3, base=2)), data=iris )

Others have mentioned the relevel command which is the best solution if you want to change the base level for all analyses on your data (or are willing to live with changing the data).

If you don't want to change the data (this is a one time change, but in the future you want the default behavior again), then you can use a combination of the C (note uppercase) function to set contrasts and the contr.treatments function with the base argument for choosing which level you want to be the baseline.

For example:

lm( Sepal.Width ~ C(Species,contr.treatment(3, base=2)), data=iris )
天煞孤星 2024-10-03 11:33:23

relevel() 命令是您问题的速记方法。它的作用是对因子进行重新排序,以便将参考电平放在第一位。因此,重新排序因子水平也会产生相同的效果,但会给您更多的控制权。也许您想要级别 3、4、0、1、2。在这种情况下......

bFactor <- factor(b, levels = c(3,4,0,1,2))

我更喜欢这种方法,因为我在代码中不仅可以更轻松地看到引用是什么,还可以看到其他值的位置(而不必查看结果)。

注意:不要将其设为有序因子。具有指定顺序的因子和有序因子不是同一件事。如果您这样做,lm() 可能会开始认为您需要多项式对比。

The relevel() command is a shorthand method to your question. What it does is reorder the factor so that whatever is the ref level is first. Therefore, reordering your factor levels will also have the same effect but gives you more control. Perhaps you wanted to have levels 3,4,0,1,2. In that case...

bFactor <- factor(b, levels = c(3,4,0,1,2))

I prefer this method because it's easier for me to see in my code not only what the reference was but the position of the other values as well (rather than having to look at the results for that).

NOTE: DO NOT make it an ordered factor. A factor with a specified order and an ordered factor are not the same thing. lm() may start to think you want polynomial contrasts if you do that.

暮光沉寂 2024-10-03 11:33:23

您还可以使用 contrasts 属性手动标记该列,回归函数似乎会考虑该属性:

contrasts(df$factorcol) <- contr.treatment(levels(df$factorcol),
   base=which(levels(df$factorcol) == 'RefLevel'))

You can also manually tag the column with a contrasts attribute, which seems to be respected by the regression functions:

contrasts(df$factorcol) <- contr.treatment(levels(df$factorcol),
   base=which(levels(df$factorcol) == 'RefLevel'))
静谧幽蓝 2024-10-03 11:33:23

对于那些正在寻找 dplyr/tidyverse 版本的人。基于 Gavin Simpson 解决方案:

# Create DF
set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
                 y = 4 + (1.5*x) + rnorm(100, sd = 2),
                 b = gl(5, 20))

# Change reference level
DF = DF %>% mutate(b = relevel(b, 3))

m2 <- lm(y ~ x + b, data = DF)
summary(m2)

For those looking for a dplyr/tidyverse version. Building on Gavin Simpson solution:

# Create DF
set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
                 y = 4 + (1.5*x) + rnorm(100, sd = 2),
                 b = gl(5, 20))

# Change reference level
DF = DF %>% mutate(b = relevel(b, 3))

m2 <- lm(y ~ x + b, data = DF)
summary(m2)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文