面板数据的双聚类标准误
我有一个 R 面板数据集(时间和横截面),并且想要计算按二维聚类的标准误差,因为我的残差是双向相关的。谷歌搜索我发现 http://thetarzan.wordpress。 com/2011/06/11/clustered-standard-errors-in-r/ 它提供了执行此操作的函数。这似乎有点临时,所以我想知道是否有一个已经经过测试的包,是否可以这样做?
我知道 sandwich
会出现 HAC 标准错误,但它不会进行双聚类(即沿二维)。
I have a panel data set in R (time and cross section) and would like to compute standard errors that are clustered by two dimensions, because my residuals are correlated both ways. Googling around I found http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/ which provides a function to do this. It seems a bit ad-hoc so I wanted to know if there is a package that has been tested and does this?
I know sandwich
does HAC standard errors, but it doesn't do double clustering (i.e. along two dimensions).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是一个老问题了。但鉴于人们似乎仍然在使用它,我想我应该在 R 中提供一些现代的多路聚类方法:
选项 1(最快):
fixest::feols()
选项 2(快速):
lfe::felm()
选项3(较慢,但灵活):
sandwich
基准测试
Aaaand,只是为了强调速度方面的问题。这是三种不同方法的基准(使用两个固定 FE 和双向聚类)。
This is an old question. But seeing as people still appear to be landing on it, I thought I'd provide some modern approaches to multiway clustering in R:
Option 1 (fastest):
fixest::feols()
Option 2 (fast):
lfe::felm()
Option 3 (slower, but flexible):
sandwich
Benchmark
Aaaand, just to belabour the point about speed. Here's a benchmark of the three different approaches (using two fixed FEs and twoway clustering).
对于面板回归,
plm
软件包可以估计沿二维的聚类 SE。使用 M. Petersen 的基准测试结果:
现在您可以获得集群 SE:
有关更多详细信息,请参阅:
但是,仅当您的数据可以强制为
pdata.frame
时,上述方法才有效。如果您有“重复的情侣(时间ID)”
,它将失败。在这种情况下,您仍然可以聚类,但只能沿一个维度进行聚类。通过仅指定一个索引,欺骗
plm
认为您拥有正确的面板数据集:这样您现在就可以获得集群SE:
您还可以使用此解决方法通过以下方式进行集群 : 更高维度或更高级别(例如
行业
或国家
)。但是,在这种情况下,您将无法使用组
(或时间
)效果
,这是该方法的主要限制。另一种适用于面板数据和其他类型数据的方法是
multiwayvcov
包。它允许双重聚类,但也允许更高维度的聚类。根据软件包的网站,它是对 Arai 代码的改进:使用 Petersen 数据和 cluster.vcov() :
For panel regressions, the
plm
package can estimate clustered SEs along two dimensions.Using M. Petersen’s benchmark results:
So that now you can obtain clustered SEs:
For more details see:
However the above works only if your data can be coerced to a
pdata.frame
. It will fail if you have"duplicate couples (time-id)"
. In this case you can still cluster, but only along one dimension.Trick
plm
into thinking that you have a proper panel data set by specifying only one index:So that now you can obtain clustered SEs:
You can also use this workaround to cluster by a higher dimension or at a higher level (e.g.
industry
orcountry
). However in that case you won't be able to use thegroup
(ortime
)effects
, which is the main limit of the approach.Another approach that works for both panel and other types of data is the
multiwayvcov
package. It allows double clustering, but also clustering at higher dimensions. As per the packages's website, it is an improvement upon Arai's code:Using the Petersen data and
cluster.vcov()
:Arai 的函数可用于对标准错误进行聚类。他有另一个用于多维度聚类的版本:
有关参考和使用示例,请参阅:
Arai's function can be used for clustering standard-errors. He has another version for clustering in multiple dimensions:
For references and usage example see:
Frank Harrell 的包
rms
(以前称为Design
)有一个我在聚类时经常使用的函数:robcov
。例如,请参阅
?robcov
的这一部分。Frank Harrell's package
rms
(which used to be namedDesign
) has a function that I use often when clustering:robcov
.See this part of
?robcov
, for example.