R 概率回归边际效应

发布于 2024-11-03 10:49:18 字数 1973 浏览 5 评论 0原文

我正在使用 R 来复制一项研究并获得与作者报道。然而，在某一时刻，我计算出的边际效应似乎小得不切实际。如果您能看一下我的推理和下面的代码，看看我是否在某一点上犯了错误，我将不胜感激。

我的样本包含 24535 个观察值，因变量“x028bin”是二进制变量取值为 0 和 1，此外还有 10 个解释变量。其中九个自变量具有数值水平，自变量“f025grouped”是由不同宗教派别组成的因素。

我想运行概率回归，包括宗教教派的虚拟变量，然后计算边际效应。为此，我首先消除缺失值，并使用因变量和自变量之间的交叉表来验证是否不存在小单元格或 0 单元格。然后我运行概率模型，该模型运行良好，并且还获得了合理的结果：

probit4AKIE <- glm(x028bin ~ x003 + x003squ + x025secv2 + x025terv2 + x007bin + x04chief + x011rec + a009bin + x045mod + c001bin + f025grouped, family=binomial(link="probit"), data=wvshm5red2delna, na.action=na.pass)

summary(probit4AKIE)

但是，当根据概率系数和比例因子计算所有变量均值的边际效应时，我获得的边际效应太小（例如 2.6042e -78）。代码如下所示：

ttt <- cbind(wvshm5red2delna$x003,
wvshm5red2delna$x003squ,
wvshm5red2delna$x025secv2,
wvshm5red2delna$x025terv2,
wvshm5red2delna$x007bin,
wvshm5red2delna$x04chief,
wvshm5red2delna$x011rec,
wvshm5red2delna$a009bin,
wvshm5red2delna$x045mod,
wvshm5red2delna$c001bin,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped) #I put variable "f025grouped" 9 times because this variable consists of 9 levels

ttt <- as.data.frame(ttt)

xbar <- as.matrix(mean(cbind(1,ttt[1:19]))) #1:19 position of variables in dataframe ttt

betaprobit4AKIE <- probit4AKIE$coefficients

zxbar <- t(xbar) %*% betaprobit4AKIE

scalefactor <- dnorm(zxbar)

marginprobit4AKIE <- scalefactor * betaprobit4AKIE[2:20] #2:20 are the positions of variables in the output of the probit model 'probit4AKIE' (variables need to be in the same ordering as in data.frame ttt), the constant in the model occupies the first position

marginprobit4AKIE #in this step I obtain values that are much too small

很抱歉，我无法为您提供工作示例，因为我的数据集是太大了。任何评论将不胜感激。多谢。

最好的，

托比亚斯

原文

I am using R to replicate a study and obtain mostly the same results the
author reported. At one point, however, I calculate marginal effects that seem to be unrealistically small. I would greatly appreciate if you could have a look at my reasoning and the code below and see if I am mistaken at one point or another.

My sample contains 24535 observations, the dependent variable "x028bin" is a
binary variable taking on the values 0 and 1, and there are furthermore 10
explaining variables. Nine of those independent variables have numeric levels, the independent variable "f025grouped" is a factor consisting of different religious denominations.

I would like to run a probit regression including dummies for religious denomination and then compute marginal effects. In order to do so, I first eliminate missing values and use cross-tabs between the dependent and independent variables to verify that there are no small or 0 cells. Then I run the probit model which works fine and I also obtain reasonable results:

probit4AKIE <- glm(x028bin ~ x003 + x003squ + x025secv2 + x025terv2 + x007bin + x04chief + x011rec + a009bin + x045mod + c001bin + f025grouped, family=binomial(link="probit"), data=wvshm5red2delna, na.action=na.pass)

summary(probit4AKIE)

However, when calculating marginal effects with all variables at their means from the probit coefficients and a scale factor, the marginal effects I obtain are much too small (e.g. 2.6042e-78).
The code looks like this:

ttt <- cbind(wvshm5red2delna$x003,
wvshm5red2delna$x003squ,
wvshm5red2delna$x025secv2,
wvshm5red2delna$x025terv2,
wvshm5red2delna$x007bin,
wvshm5red2delna$x04chief,
wvshm5red2delna$x011rec,
wvshm5red2delna$a009bin,
wvshm5red2delna$x045mod,
wvshm5red2delna$c001bin,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped) #I put variable "f025grouped" 9 times because this variable consists of 9 levels

ttt <- as.data.frame(ttt)

xbar <- as.matrix(mean(cbind(1,ttt[1:19]))) #1:19 position of variables in dataframe ttt

betaprobit4AKIE <- probit4AKIE$coefficients

zxbar <- t(xbar) %*% betaprobit4AKIE

scalefactor <- dnorm(zxbar)

marginprobit4AKIE <- scalefactor * betaprobit4AKIE[2:20] #2:20 are the positions of variables in the output of the probit model 'probit4AKIE' (variables need to be in the same ordering as in data.frame ttt), the constant in the model occupies the first position

marginprobit4AKIE #in this step I obtain values that are much too small

I apologize that I can not provide you with a working example as my dataset is
much too large. Any comment would be greatly appreciated. Thanks a lot.

Best,

Tobias

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情痴 2024-11-10 10:49:18

@Gavin 是对的，最好在姐妹网站上询问。

无论如何，这是我解释概率系数的技巧。

Probit 回归系数与 Logit 系数相同，最高可达一个尺度 (1.6)。因此，如果概率模型的拟合度为 Pr(y=1) = fi(.5 - .3*x)，则相当于逻辑模型 Pr(y=1) ) = invlogit(1.6(.5 - .3*x))。

我用它来制作图形，使用arm包中的函数invlogit。另一种可能性是将所有系数（包括截距）乘以1.6，然后应用“除以4规则”（参见Gelman和Hill的书），即将新系数除以4，你会发现对应于 x 单位差异的预测差异的上限。

这是一个例子。

x1 = rbinom(100,1,.5)
x2 = rbinom(100,1,.3)
x3 = rbinom(100,1,.9)
ystar = -.5  + x1 + x2 - x3 + rnorm(100)
y = ifelse(ystar>0,1,0)
probit = glm(y~x1 + x2 + x3, family=binomial(link='probit'))
xbar <- as.matrix(mean(cbind(1,ttt[1:3])))

# now the graphic, i.e., the marginal effect of x1, x2 and x3
library(arm)
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*x + probit$coef[3]*xbar[3] + probit$coef[4]*xbar[4]))) #x1
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*xbar[2] + probit$coef[3]*x + probit$coef[4]*xbar[4]))) #x2
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*xbar[2] + probit$coef[3]*xbar[3] + probit$coef[4]*x))) #x3

@Gavin is right and it's better to ask at the sister site.

In any case, here's my trick to interpret probit coefficients.

The probit regression coefficients are the same as the logit coefficients, up to a scale (1.6). So, if the fit of a probit model is Pr(y=1) = fi(.5 - .3*x), this is equivalent to the logistic model Pr(y=1) = invlogit(1.6(.5 - .3*x)).

And I use this to make a graphic, using the function invlogit of package arm. Another possibility is just to multiply all coefficients (including the intercept) by 1.6, and then applying the 'divide by 4 rule' (see the book by Gelman and Hill), i.e, divide the new coefficients by 4, and you will find out an upper bound of the predictive difference corresponding to a unit difference in x.

Here's an example.

x1 = rbinom(100,1,.5)
x2 = rbinom(100,1,.3)
x3 = rbinom(100,1,.9)
ystar = -.5  + x1 + x2 - x3 + rnorm(100)
y = ifelse(ystar>0,1,0)
probit = glm(y~x1 + x2 + x3, family=binomial(link='probit'))
xbar <- as.matrix(mean(cbind(1,ttt[1:3])))

# now the graphic, i.e., the marginal effect of x1, x2 and x3
library(arm)
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*x + probit$coef[3]*xbar[3] + probit$coef[4]*xbar[4]))) #x1
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*xbar[2] + probit$coef[3]*x + probit$coef[4]*xbar[4]))) #x2
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*xbar[2] + probit$coef[3]*xbar[3] + probit$coef[4]*x))) #x3

回复收藏 0 原文

时光瘦了 2024-11-10 10:49:18

这将解决 probit 或 logit 的问题：

mfxboot <- function(modform,dist,data,boot=1000,digits=3){
  x <- glm(modform, family=binomial(link=dist),data)
  # get marginal effects
  pdf <- ifelse(dist=="probit",
                mean(dnorm(predict(x, type = "link"))),
                mean(dlogis(predict(x, type = "link"))))
  marginal.effects <- pdf*coef(x)
  # start bootstrap
  bootvals <- matrix(rep(NA,boot*length(coef(x))), nrow=boot)
  set.seed(1111)
  for(i in 1:boot){
    samp1 <- data[sample(1:dim(data)[1],replace=T,dim(data)[1]),]
    x1 <- glm(modform, family=binomial(link=dist),samp1)
    pdf1 <- ifelse(dist=="probit",
                   mean(dnorm(predict(x, type = "link"))),
                   mean(dlogis(predict(x, type = "link"))))
    bootvals[i,] <- pdf1*coef(x1)
  }
  res <- cbind(marginal.effects,apply(bootvals,2,sd),marginal.effects/apply(bootvals,2,sd))
  if(names(x$coefficients[1])=="(Intercept)"){
    res1 <- res[2:nrow(res),]
    res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep=""),res1)),nrow=dim(res1)[1])
    rownames(res2) <- rownames(res1)
  } else {
    res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep="")),nrow=dim(res)[1]))
    rownames(res2) <- rownames(res)
  }
  colnames(res2) <- c("marginal.effect","standard.error","z.ratio")
  return(res2)
}

来源：http://www.r-bloggers.com/probitlogit-marginal-effects-in-r/

This will do the trick for probit or logit:

mfxboot <- function(modform,dist,data,boot=1000,digits=3){
  x <- glm(modform, family=binomial(link=dist),data)
  # get marginal effects
  pdf <- ifelse(dist=="probit",
                mean(dnorm(predict(x, type = "link"))),
                mean(dlogis(predict(x, type = "link"))))
  marginal.effects <- pdf*coef(x)
  # start bootstrap
  bootvals <- matrix(rep(NA,boot*length(coef(x))), nrow=boot)
  set.seed(1111)
  for(i in 1:boot){
    samp1 <- data[sample(1:dim(data)[1],replace=T,dim(data)[1]),]
    x1 <- glm(modform, family=binomial(link=dist),samp1)
    pdf1 <- ifelse(dist=="probit",
                   mean(dnorm(predict(x, type = "link"))),
                   mean(dlogis(predict(x, type = "link"))))
    bootvals[i,] <- pdf1*coef(x1)
  }
  res <- cbind(marginal.effects,apply(bootvals,2,sd),marginal.effects/apply(bootvals,2,sd))
  if(names(x$coefficients[1])=="(Intercept)"){
    res1 <- res[2:nrow(res),]
    res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep=""),res1)),nrow=dim(res1)[1])
    rownames(res2) <- rownames(res1)
  } else {
    res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep="")),nrow=dim(res)[1]))
    rownames(res2) <- rownames(res)
  }
  colnames(res2) <- c("marginal.effect","standard.error","z.ratio")
  return(res2)
}

Source: http://www.r-bloggers.com/probitlogit-marginal-effects-in-r/

回复收藏 0 原文

~没有更多了~