R 中的固定效应回归(具有大量虚拟变量)
当虚拟变量的数量导致模型矩阵超过 R 最大向量长度时,是否有一种简单的方法可以在 R 中进行固定效应回归?例如,
> m <- lm(log(bid) ~ after + I(after*score) + id, data = data)
Error in model.matrix.default(mt, mf, contrasts) :
cannot allocate vector of length 905986769
其中 id 是一个因素(并且是导致上述问题的变量)。
我知道我可以遍历并消除所有数据的平均值,但这会消除标准误差(是的,您可以通过 df 调整“手动”计算 SE,但我想最大限度地减少我的概率正在引入新的错误)。我查看了 plm 包,但它似乎只针对带有时间组件的经典面板数据而设计,这不是我的数据结构。
Is there an easy way to do a fixed-effects regression in R when the number of dummy variables leads to a model matrix that exceeds the R maximum vector length? E.g.,
> m <- lm(log(bid) ~ after + I(after*score) + id, data = data)
Error in model.matrix.default(mt, mf, contrasts) :
cannot allocate vector of length 905986769
where id is a factor (and is the variable causing the problem above).
I know that I could go through and de-mean all the data, but this throws the standard errors off (yes, you could compute the SE's "by hand" w/ a df adjustment but I'd like to minimize the probability that I'm introducing new errors). I've looked at the plm package but it seems only designed for classical panel data w/ a time component, which is not the structure of my data.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Plm 可以很好地处理此类数据。时间部分不是必需的。
Plm will work fine for this sort of data. The time component is not required.
Fixst() 包应该可以帮助你。例如,您可以有效地降低因素:
对于大型数据集,这比 plm() 快得多。据我所知,lfe 软件包不再受支持?请参阅此处的警告:https://cran.r-project.org /web/packages/lfe/index.html
The fixest() package should help you here. You can for example efficiently within demean the factor:
With large datasets, this is much faster than plm(). To the best of my knowledge, the lfe package is not supported anymore? See the warning here: https://cran.r-project.org/web/packages/lfe/index.html