R：避免summary.plm

发布于 2024-10-31 15:01:22 字数 1278 浏览 7 评论 0原文

我正在使用 R 运行蒙特卡罗模拟，研究面板数据估计器的性能。因为我将运行大量试验，所以我需要从我的代码中获得至少不错的性能。

使用 Rprof 进行 10 次模拟试验表明，很大一部分时间花在了对 summary.plm 的调用上。下面提供了 Rprofsummary 的前几行：

$by.total
                            total.time total.pct self.time self.pct
"trial"                          54.48     100.0      0.00      0.0
"coefs"                          53.90      98.9      0.06      0.1
"model.matrix"                   36.72      67.4      0.10      0.2
"model.matrix.pFormula"          35.98      66.0      0.06      0.1
"summary"                        33.82      62.1      0.00      0.0
"summary.plm"                    33.80      62.0      0.08      0.1
"r.squared"                      29.00      53.2      0.02      0.0
"FUN"                            24.84      45.6      7.52     13.8

我在代码中调用 summary 因为我需要获取系数估计的标准误差以及系数本身（我可以从 plm 对象中获得）。我的调用看起来

regression <- plm(g ~ y0 + Xit, data=panel_data, model=model, index=c("country","period"))

coefficients_estimated <- summary(regression)$coefficients[,"Estimate"]
ses_estimated <- summary(regression)$coefficients[,"Std. Error"]

我有一种挥之不去的感觉，这是对 cpu 时间的巨大浪费，但我不太了解 R 如何避免调用摘要。我希望了解有关幕后发生的情况的任何信息，或者减少执行时间的某种方法。

原文

I'm using R to run a Monte-Carlo simulation studying the performance of panel data estimators. Because I'll be running a large number of trials, I need to get at least decent performance from my code.

Using Rprof on 10 trials of my simulation shows that a significant portion of time is spent in calls to summary.plm. The first few lines of Rprofsummary are provided below:

$by.total
                            total.time total.pct self.time self.pct
"trial"                          54.48     100.0      0.00      0.0
"coefs"                          53.90      98.9      0.06      0.1
"model.matrix"                   36.72      67.4      0.10      0.2
"model.matrix.pFormula"          35.98      66.0      0.06      0.1
"summary"                        33.82      62.1      0.00      0.0
"summary.plm"                    33.80      62.0      0.08      0.1
"r.squared"                      29.00      53.2      0.02      0.0
"FUN"                            24.84      45.6      7.52     13.8

I'm calling summary in my code because I need to get the standard errors of the coefficient estimates as well as the coefficients themselves (which I could get from just the plm object). My call looks like

regression <- plm(g ~ y0 + Xit, data=panel_data, model=model, index=c("country","period"))

coefficients_estimated <- summary(regression)$coefficients[,"Estimate"]
ses_estimated <- summary(regression)$coefficients[,"Std. Error"]

I have a nagging feeling that this is a huge waste of cpu time, but I don't know enough about how R does things to avoid calling summary. I'd appreciate any information on what's going on behind the scenes here, or some way of reducing the time it takes for this to excecute.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初心未许 2024-11-07 15:01:22

您只需查看 plm:::summary.plm 内部即可了解它在做什么。当您这样做时，您将看到在模型拟合上调用 summary() 的两行可以替换为：

coefficients_estimated <- coef(regression)
ses_estimated <- sqrt(diag(vcov(regression)))

例如：

require(plm)
data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, 
          data = Produc, index = c("state","year"))

summary(zz) 给出：

> summary(zz)
Oneway (individual) effect Within Model

....

Coefficients :
             Estimate  Std. Error t-value  Pr(>|t|)    
log(pcap) -0.02614965  0.02900158 -0.9017    0.3675    
log(pc)    0.29200693  0.02511967 11.6246 < 2.2e-16 ***
log(emp)   0.76815947  0.03009174 25.5273 < 2.2e-16 ***
unemp     -0.00529774  0.00098873 -5.3582 1.114e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
....

并且我显示了 return for zz 的两行：

> coef(zz)
   log(pcap)      log(pc)     log(emp)        unemp 
-0.026149654  0.292006925  0.768159473 -0.005297741 
> sqrt(diag(vcov(zz)))
   log(pcap)      log(pc)     log(emp)        unemp 
0.0290015755 0.0251196728 0.0300917394 0.0009887257

您实际上没有提供足够的信息（例如您的模拟代码或 Rprof() 的完整输出）来说明这是否会帮助 - 看起来确实没有花费大量时间summary(); FUN 比您显示的其他任何内容都要昂贵得多，并且在您显示的元素中，r.squared() 是唯一出现在 plm 中的元素： ::summary.plm() 似乎根本不需要时间。

因此，上述措施是否会明显加快速度还有待观察。

You just need to look inside plm:::summary.plm to see what it is doing. When you do, you'll see that your two lines calling summary() on your model fit can be replaced with:

coefficients_estimated <- coef(regression)
ses_estimated <- sqrt(diag(vcov(regression)))

For example:

require(plm)
data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, 
          data = Produc, index = c("state","year"))

summary(zz) gives:

> summary(zz)
Oneway (individual) effect Within Model

....

Coefficients :
             Estimate  Std. Error t-value  Pr(>|t|)    
log(pcap) -0.02614965  0.02900158 -0.9017    0.3675    
log(pc)    0.29200693  0.02511967 11.6246 < 2.2e-16 ***
log(emp)   0.76815947  0.03009174 25.5273 < 2.2e-16 ***
unemp     -0.00529774  0.00098873 -5.3582 1.114e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
....

and the two lines I showed return for zz:

> coef(zz)
   log(pcap)      log(pc)     log(emp)        unemp 
-0.026149654  0.292006925  0.768159473 -0.005297741 
> sqrt(diag(vcov(zz)))
   log(pcap)      log(pc)     log(emp)        unemp 
0.0290015755 0.0251196728 0.0300917394 0.0009887257

You don't really provide enough information (your simulation code nor the full output from Rprof() for example) to say whether this will help - it certainly doesn't look like vast amounts of time are spent in summary(); FUN is far more costly than anything else you show, and of the elements you do show, r.squared() is the only one that appears in plm:::summary.plm() and it seems to take no time at all.

So, whether the above speeds things up appreciably remains to be seen.

回复收藏 0 原文