R:避免summary.plm

发布于 2024-10-31 15:01:22 字数 1278 浏览 5 评论 0原文

我正在使用 R 运行蒙特卡罗模拟,研究面板数据估计器的性能。因为我将运行大量试验,所以我需要从我的代码中获得至少不错的性能。

使用 Rprof 进行 10 次模拟试验表明,很大一部分时间花在了对 summary.plm 的调用上。下面提供了 Rprofsummary 的前几行:

$by.total
                            total.time total.pct self.time self.pct
"trial"                          54.48     100.0      0.00      0.0
"coefs"                          53.90      98.9      0.06      0.1
"model.matrix"                   36.72      67.4      0.10      0.2
"model.matrix.pFormula"          35.98      66.0      0.06      0.1
"summary"                        33.82      62.1      0.00      0.0
"summary.plm"                    33.80      62.0      0.08      0.1
"r.squared"                      29.00      53.2      0.02      0.0
"FUN"                            24.84      45.6      7.52     13.8

我在代码中调用 summary 因为我需要获取系数估计的标准误差以及系数本身(我可以从 plm 对象中获得)。我的调用看起来

regression <- plm(g ~ y0 + Xit, data=panel_data, model=model, index=c("country","period"))

coefficients_estimated <- summary(regression)$coefficients[,"Estimate"]
ses_estimated <- summary(regression)$coefficients[,"Std. Error"]

我有一种挥之不去的感觉,这是对 cpu 时间的巨大浪费,但我不太了解 R 如何避免调用摘要。我希望了解有关幕后发生的情况的任何信息,或者减少执行时间的某种方法。

I'm using R to run a Monte-Carlo simulation studying the performance of panel data estimators. Because I'll be running a large number of trials, I need to get at least decent performance from my code.

Using Rprof on 10 trials of my simulation shows that a significant portion of time is spent in calls to summary.plm. The first few lines of Rprofsummary are provided below:

$by.total
                            total.time total.pct self.time self.pct
"trial"                          54.48     100.0      0.00      0.0
"coefs"                          53.90      98.9      0.06      0.1
"model.matrix"                   36.72      67.4      0.10      0.2
"model.matrix.pFormula"          35.98      66.0      0.06      0.1
"summary"                        33.82      62.1      0.00      0.0
"summary.plm"                    33.80      62.0      0.08      0.1
"r.squared"                      29.00      53.2      0.02      0.0
"FUN"                            24.84      45.6      7.52     13.8

I'm calling summary in my code because I need to get the standard errors of the coefficient estimates as well as the coefficients themselves (which I could get from just the plm object). My call looks like

regression <- plm(g ~ y0 + Xit, data=panel_data, model=model, index=c("country","period"))

coefficients_estimated <- summary(regression)$coefficients[,"Estimate"]
ses_estimated <- summary(regression)$coefficients[,"Std. Error"]

I have a nagging feeling that this is a huge waste of cpu time, but I don't know enough about how R does things to avoid calling summary. I'd appreciate any information on what's going on behind the scenes here, or some way of reducing the time it takes for this to excecute.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

初心未许 2024-11-07 15:01:22

您只需查看 plm:::summary.plm 内部即可了解它在做什么。当您这样做时,您将看到在模型拟合上调用 summary() 的两行可以替换为:

coefficients_estimated <- coef(regression)
ses_estimated <- sqrt(diag(vcov(regression)))

例如:

require(plm)
data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, 
          data = Produc, index = c("state","year"))

summary(zz) 给出:

> summary(zz)
Oneway (individual) effect Within Model

....

Coefficients :
             Estimate  Std. Error t-value  Pr(>|t|)    
log(pcap) -0.02614965  0.02900158 -0.9017    0.3675    
log(pc)    0.29200693  0.02511967 11.6246 < 2.2e-16 ***
log(emp)   0.76815947  0.03009174 25.5273 < 2.2e-16 ***
unemp     -0.00529774  0.00098873 -5.3582 1.114e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
....

并且我显示了 return for zz 的两行:

> coef(zz)
   log(pcap)      log(pc)     log(emp)        unemp 
-0.026149654  0.292006925  0.768159473 -0.005297741 
> sqrt(diag(vcov(zz)))
   log(pcap)      log(pc)     log(emp)        unemp 
0.0290015755 0.0251196728 0.0300917394 0.0009887257

您实际上没有提供足够的信息(例如您的模拟代码或 Rprof() 的完整输出)来说明这是否会帮助 - 看起来确实没有花费大量时间summary(); FUN 比您显示的其他任何内容都要昂贵得多,并且在您显示的元素中,r.squared() 是唯一出现在 plm 中的元素: ::summary.plm() 似乎根本不需要时间。

因此,上述措施是否会明显加快速度还有待观察。

You just need to look inside plm:::summary.plm to see what it is doing. When you do, you'll see that your two lines calling summary() on your model fit can be replaced with:

coefficients_estimated <- coef(regression)
ses_estimated <- sqrt(diag(vcov(regression)))

For example:

require(plm)
data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, 
          data = Produc, index = c("state","year"))

summary(zz) gives:

> summary(zz)
Oneway (individual) effect Within Model

....

Coefficients :
             Estimate  Std. Error t-value  Pr(>|t|)    
log(pcap) -0.02614965  0.02900158 -0.9017    0.3675    
log(pc)    0.29200693  0.02511967 11.6246 < 2.2e-16 ***
log(emp)   0.76815947  0.03009174 25.5273 < 2.2e-16 ***
unemp     -0.00529774  0.00098873 -5.3582 1.114e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
....

and the two lines I showed return for zz:

> coef(zz)
   log(pcap)      log(pc)     log(emp)        unemp 
-0.026149654  0.292006925  0.768159473 -0.005297741 
> sqrt(diag(vcov(zz)))
   log(pcap)      log(pc)     log(emp)        unemp 
0.0290015755 0.0251196728 0.0300917394 0.0009887257

You don't really provide enough information (your simulation code nor the full output from Rprof() for example) to say whether this will help - it certainly doesn't look like vast amounts of time are spent in summary(); FUN is far more costly than anything else you show, and of the elements you do show, r.squared() is the only one that appears in plm:::summary.plm() and it seems to take no time at all.

So, whether the above speeds things up appreciably remains to be seen.

轮廓§ 2024-11-07 15:01:22

如果您想更进一步,请查看 plm:::plm 的实际函数代码,您会注意到在最终调用 之前有很多参数检查plm:::plm.fit 您可以(如果确实需要)直接跳至 ​​plm.fit

最后一点。您提到您的问题是蒙特卡罗模拟。您可以利用并行计算来提高速度吗?

If you want to take things further, then have a look at the actual function code of plm:::plm You will notice that there is a lot of argument checking, before a final call to plm:::plm.fit You could (if really wanted), skip straight to plm.fit.

One final point. You mention that your problem is a Monte Carlo simulation. Can you leverage parallel computing for your speed increases?

随梦而飞# 2024-11-07 15:01:22

只需使用coeftest(zz)coeftest 位于 lmtest 包中;它会比 summary.plm 更快地为您提供来自 plm 对象的系数和标准误差。

Just use coeftest(zz). coeftest is in the lmtest package; it will give you the coefficients and standard errors from plm objects much more quickly than summary.plm.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文