R:避免summary.plm
我正在使用 R 运行蒙特卡罗模拟,研究面板数据估计器的性能。因为我将运行大量试验,所以我需要从我的代码中获得至少不错的性能。
使用 Rprof
进行 10 次模拟试验表明,很大一部分时间花在了对 summary.plm
的调用上。下面提供了 Rprofsummary
的前几行:
$by.total
total.time total.pct self.time self.pct
"trial" 54.48 100.0 0.00 0.0
"coefs" 53.90 98.9 0.06 0.1
"model.matrix" 36.72 67.4 0.10 0.2
"model.matrix.pFormula" 35.98 66.0 0.06 0.1
"summary" 33.82 62.1 0.00 0.0
"summary.plm" 33.80 62.0 0.08 0.1
"r.squared" 29.00 53.2 0.02 0.0
"FUN" 24.84 45.6 7.52 13.8
我在代码中调用 summary
因为我需要获取系数估计的标准误差以及系数本身(我可以从 plm 对象中获得)。我的调用看起来
regression <- plm(g ~ y0 + Xit, data=panel_data, model=model, index=c("country","period"))
coefficients_estimated <- summary(regression)$coefficients[,"Estimate"]
ses_estimated <- summary(regression)$coefficients[,"Std. Error"]
我有一种挥之不去的感觉,这是对 cpu 时间的巨大浪费,但我不太了解 R 如何避免调用摘要。我希望了解有关幕后发生的情况的任何信息,或者减少执行时间的某种方法。
I'm using R to run a Monte-Carlo simulation studying the performance of panel data estimators. Because I'll be running a large number of trials, I need to get at least decent performance from my code.
Using Rprof
on 10 trials of my simulation shows that a significant portion of time is spent in calls to summary.plm
. The first few lines of Rprofsummary
are provided below:
$by.total
total.time total.pct self.time self.pct
"trial" 54.48 100.0 0.00 0.0
"coefs" 53.90 98.9 0.06 0.1
"model.matrix" 36.72 67.4 0.10 0.2
"model.matrix.pFormula" 35.98 66.0 0.06 0.1
"summary" 33.82 62.1 0.00 0.0
"summary.plm" 33.80 62.0 0.08 0.1
"r.squared" 29.00 53.2 0.02 0.0
"FUN" 24.84 45.6 7.52 13.8
I'm calling summary
in my code because I need to get the standard errors of the coefficient estimates as well as the coefficients themselves (which I could get from just the plm object). My call looks like
regression <- plm(g ~ y0 + Xit, data=panel_data, model=model, index=c("country","period"))
coefficients_estimated <- summary(regression)$coefficients[,"Estimate"]
ses_estimated <- summary(regression)$coefficients[,"Std. Error"]
I have a nagging feeling that this is a huge waste of cpu time, but I don't know enough about how R does things to avoid calling summary. I'd appreciate any information on what's going on behind the scenes here, or some way of reducing the time it takes for this to excecute.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您只需查看
plm:::summary.plm
内部即可了解它在做什么。当您这样做时,您将看到在模型拟合上调用summary()
的两行可以替换为:例如:
summary(zz)
给出:并且我显示了 return for
zz
的两行:您实际上没有提供足够的信息(例如您的模拟代码或
Rprof()
的完整输出)来说明这是否会帮助 - 看起来确实没有花费大量时间summary()
;FUN
比您显示的其他任何内容都要昂贵得多,并且在您显示的元素中,r.squared()
是唯一出现在plm 中的元素: ::summary.plm()
似乎根本不需要时间。因此,上述措施是否会明显加快速度还有待观察。
You just need to look inside
plm:::summary.plm
to see what it is doing. When you do, you'll see that your two lines callingsummary()
on your model fit can be replaced with:For example:
summary(zz)
gives:and the two lines I showed return for
zz
:You don't really provide enough information (your simulation code nor the full output from
Rprof()
for example) to say whether this will help - it certainly doesn't look like vast amounts of time are spent insummary()
;FUN
is far more costly than anything else you show, and of the elements you do show,r.squared()
is the only one that appears inplm:::summary.plm()
and it seems to take no time at all.So, whether the above speeds things up appreciably remains to be seen.
如果您想更进一步,请查看
plm:::plm
的实际函数代码,您会注意到在最终调用之前有很多参数检查plm:::plm.fit
您可以(如果确实需要)直接跳至 plm.fit
。最后一点。您提到您的问题是蒙特卡罗模拟。您可以利用并行计算来提高速度吗?
If you want to take things further, then have a look at the actual function code of
plm:::plm
You will notice that there is a lot of argument checking, before a final call toplm:::plm.fit
You could (if really wanted), skip straight toplm.fit
.One final point. You mention that your problem is a Monte Carlo simulation. Can you leverage parallel computing for your speed increases?
只需使用
coeftest(zz)
。coeftest
位于lmtest
包中;它会比summary.plm
更快地为您提供来自plm
对象的系数和标准误差。Just use
coeftest(zz)
.coeftest
is in thelmtest
package; it will give you the coefficients and standard errors fromplm
objects much more quickly thansummary.plm
.