R 概率回归边际效应
我正在使用 R 来复制一项研究并获得与 作者报道。然而,在某一时刻,我计算出的边际效应似乎小得不切实际。如果您能看一下我的推理和下面的代码,看看我是否在某一点上犯了错误,我将不胜感激。
我的样本包含 24535 个观察值,因变量“x028bin”是 二进制变量取值为 0 和 1,此外还有 10 个 解释变量。其中九个自变量具有数值水平,自变量“f025grouped”是由不同宗教派别组成的因素。
我想运行概率回归,包括宗教教派的虚拟变量,然后计算边际效应。为此,我首先消除缺失值,并使用因变量和自变量之间的交叉表来验证是否不存在小单元格或 0 单元格。然后我运行概率模型,该模型运行良好,并且还获得了合理的结果:
probit4AKIE <- glm(x028bin ~ x003 + x003squ + x025secv2 + x025terv2 + x007bin + x04chief + x011rec + a009bin + x045mod + c001bin + f025grouped, family=binomial(link="probit"), data=wvshm5red2delna, na.action=na.pass)
summary(probit4AKIE)
但是,当根据概率系数和比例因子计算所有变量均值的边际效应时,我获得的边际效应太小(例如 2.6042e -78)。 代码如下所示:
ttt <- cbind(wvshm5red2delna$x003,
wvshm5red2delna$x003squ,
wvshm5red2delna$x025secv2,
wvshm5red2delna$x025terv2,
wvshm5red2delna$x007bin,
wvshm5red2delna$x04chief,
wvshm5red2delna$x011rec,
wvshm5red2delna$a009bin,
wvshm5red2delna$x045mod,
wvshm5red2delna$c001bin,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped) #I put variable "f025grouped" 9 times because this variable consists of 9 levels
ttt <- as.data.frame(ttt)
xbar <- as.matrix(mean(cbind(1,ttt[1:19]))) #1:19 position of variables in dataframe ttt
betaprobit4AKIE <- probit4AKIE$coefficients
zxbar <- t(xbar) %*% betaprobit4AKIE
scalefactor <- dnorm(zxbar)
marginprobit4AKIE <- scalefactor * betaprobit4AKIE[2:20] #2:20 are the positions of variables in the output of the probit model 'probit4AKIE' (variables need to be in the same ordering as in data.frame ttt), the constant in the model occupies the first position
marginprobit4AKIE #in this step I obtain values that are much too small
很抱歉,我无法为您提供工作示例,因为我的数据集是 太大了。任何评论将不胜感激。多谢。
最好的,
托比亚斯
I am using R to replicate a study and obtain mostly the same results the
author reported. At one point, however, I calculate marginal effects that seem to be unrealistically small. I would greatly appreciate if you could have a look at my reasoning and the code below and see if I am mistaken at one point or another.
My sample contains 24535 observations, the dependent variable "x028bin" is a
binary variable taking on the values 0 and 1, and there are furthermore 10
explaining variables. Nine of those independent variables have numeric levels, the independent variable "f025grouped" is a factor consisting of different religious denominations.
I would like to run a probit regression including dummies for religious denomination and then compute marginal effects. In order to do so, I first eliminate missing values and use cross-tabs between the dependent and independent variables to verify that there are no small or 0 cells. Then I run the probit model which works fine and I also obtain reasonable results:
probit4AKIE <- glm(x028bin ~ x003 + x003squ + x025secv2 + x025terv2 + x007bin + x04chief + x011rec + a009bin + x045mod + c001bin + f025grouped, family=binomial(link="probit"), data=wvshm5red2delna, na.action=na.pass)
summary(probit4AKIE)
However, when calculating marginal effects with all variables at their means from the probit coefficients and a scale factor, the marginal effects I obtain are much too small (e.g. 2.6042e-78).
The code looks like this:
ttt <- cbind(wvshm5red2delna$x003,
wvshm5red2delna$x003squ,
wvshm5red2delna$x025secv2,
wvshm5red2delna$x025terv2,
wvshm5red2delna$x007bin,
wvshm5red2delna$x04chief,
wvshm5red2delna$x011rec,
wvshm5red2delna$a009bin,
wvshm5red2delna$x045mod,
wvshm5red2delna$c001bin,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped) #I put variable "f025grouped" 9 times because this variable consists of 9 levels
ttt <- as.data.frame(ttt)
xbar <- as.matrix(mean(cbind(1,ttt[1:19]))) #1:19 position of variables in dataframe ttt
betaprobit4AKIE <- probit4AKIE$coefficients
zxbar <- t(xbar) %*% betaprobit4AKIE
scalefactor <- dnorm(zxbar)
marginprobit4AKIE <- scalefactor * betaprobit4AKIE[2:20] #2:20 are the positions of variables in the output of the probit model 'probit4AKIE' (variables need to be in the same ordering as in data.frame ttt), the constant in the model occupies the first position
marginprobit4AKIE #in this step I obtain values that are much too small
I apologize that I can not provide you with a working example as my dataset is
much too large. Any comment would be greatly appreciated. Thanks a lot.
Best,
Tobias
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
@Gavin 是对的,最好在姐妹网站上询问。
无论如何,这是我解释概率系数的技巧。
Probit 回归系数与 Logit 系数相同,最高可达一个尺度 (1.6)。因此,如果概率模型的拟合度为
Pr(y=1) = fi(.5 - .3*x)
,则相当于逻辑模型Pr(y=1) ) = invlogit(1.6(.5 - .3*x))
。我用它来制作图形,使用
arm
包中的函数invlogit。另一种可能性是将所有系数(包括截距)乘以1.6,然后应用“除以4规则”(参见Gelman和Hill的书),即将新系数除以4,你会发现对应于 x 单位差异的预测差异的上限。这是一个例子。
@Gavin is right and it's better to ask at the sister site.
In any case, here's my trick to interpret probit coefficients.
The probit regression coefficients are the same as the logit coefficients, up to a scale (1.6). So, if the fit of a probit model is
Pr(y=1) = fi(.5 - .3*x)
, this is equivalent to the logistic modelPr(y=1) = invlogit(1.6(.5 - .3*x))
.And I use this to make a graphic, using the function invlogit of package
arm
. Another possibility is just to multiply all coefficients (including the intercept) by 1.6, and then applying the 'divide by 4 rule' (see the book by Gelman and Hill), i.e, divide the new coefficients by 4, and you will find out an upper bound of the predictive difference corresponding to a unit difference in x.Here's an example.
这将解决
probit
或logit
的问题:来源:http://www.r-bloggers.com/probitlogit-marginal-effects-in-r/
This will do the trick for
probit
orlogit
:Source: http://www.r-bloggers.com/probitlogit-marginal-effects-in-r/