R 中的固定效应回归(具有大量虚拟变量)

发布于 2024-08-23 12:03:27 字数 420 浏览 6 评论 0原文

当虚拟变量的数量导致模型矩阵超过 R 最大向量长度时,是否有一种简单的方法可以在 R 中进行固定效应回归?例如,

> m <- lm(log(bid) ~ after + I(after*score) + id, data = data)
Error in model.matrix.default(mt, mf, contrasts) : 
cannot allocate vector of length 905986769

其中 id 是一个因素(并且是导致上述问题的变量)。

我知道我可以遍历并消除所有数据的平均值,但这会消除标准误差(是的,您可以通过 df 调整“手动”计算 SE,但我想最大限度地减少我的概率正在引入新的错误)。我查看了 plm 包,但它似乎只针对带有时间组件的经典面板数据而设计,这不是我的数据结构。

Is there an easy way to do a fixed-effects regression in R when the number of dummy variables leads to a model matrix that exceeds the R maximum vector length? E.g.,

> m <- lm(log(bid) ~ after + I(after*score) + id, data = data)
Error in model.matrix.default(mt, mf, contrasts) : 
cannot allocate vector of length 905986769

where id is a factor (and is the variable causing the problem above).

I know that I could go through and de-mean all the data, but this throws the standard errors off (yes, you could compute the SE's "by hand" w/ a df adjustment but I'd like to minimize the probability that I'm introducing new errors). I've looked at the plm package but it seems only designed for classical panel data w/ a time component, which is not the structure of my data.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

黎歌 2024-08-30 12:03:27

Plm 可以很好地处理此类数据。时间部分不是必需的。

> library(plm)
> data("Produc", package="plm")
> zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, index=c("state"))
> zz2 <- lm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp+factor(state), data=Produc)
> summary(zz)$coefficients[,1:3]
              Estimate   Std. Error    t-value
log(pcap) -0.026149654 0.0290015755 -0.9016632
log(pc)    0.292006925 0.0251196728 11.6246309
log(emp)   0.768159473 0.0300917394 25.5272539
unemp     -0.005297741 0.0009887257 -5.3581508
> summary(zz2)$coefficients[1:5,1:3]
                Estimate   Std. Error    t value
(Intercept)  2.201617056 0.1760038727 12.5089126
log(pcap)   -0.026149654 0.0290015755 -0.9016632
log(pc)      0.292006925 0.0251196728 11.6246309
log(emp)     0.768159473 0.0300917394 25.5272539
unemp       -0.005297741 0.0009887257 -5.3581508

Plm will work fine for this sort of data. The time component is not required.

> library(plm)
> data("Produc", package="plm")
> zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, index=c("state"))
> zz2 <- lm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp+factor(state), data=Produc)
> summary(zz)$coefficients[,1:3]
              Estimate   Std. Error    t-value
log(pcap) -0.026149654 0.0290015755 -0.9016632
log(pc)    0.292006925 0.0251196728 11.6246309
log(emp)   0.768159473 0.0300917394 25.5272539
unemp     -0.005297741 0.0009887257 -5.3581508
> summary(zz2)$coefficients[1:5,1:3]
                Estimate   Std. Error    t value
(Intercept)  2.201617056 0.1760038727 12.5089126
log(pcap)   -0.026149654 0.0290015755 -0.9016632
log(pc)      0.292006925 0.0251196728 11.6246309
log(emp)     0.768159473 0.0300917394 25.5272539
unemp       -0.005297741 0.0009887257 -5.3581508
沉溺在你眼里的海 2024-08-30 12:03:27

Fixst() 包应该可以帮助你。例如,您可以有效地降低因素:

library(fixest)
feols(log(bid) ~ after + I(after*score) | id, data = data)

对于大型数据集,这比 plm() 快得多。据我所知,lfe 软件包不再受支持?请参阅此处的警告:https://cran.r-project.org /web/packages/lfe/index.html

The fixest() package should help you here. You can for example efficiently within demean the factor:

library(fixest)
feols(log(bid) ~ after + I(after*score) | id, data = data)

With large datasets, this is much faster than plm(). To the best of my knowledge, the lfe package is not supported anymore? See the warning here: https://cran.r-project.org/web/packages/lfe/index.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文