在 SAS 和 R 中复制概率回归

发布于 2024-09-12 04:40:12 字数 1205 浏览 6 评论 0原文

我试图在 R 中复制我的 SAS 工作，但得到的结果略有不同——这些差异无法用舍入误差来解释。

这是我的 SAS 代码：

proc qlim data=mydata;
   model y = x1 x2 x3/ discrete(d=probit);
   output out=outdata marginal;
   title "just ran QLIM model";
run;
quit;

这是我的 R 代码：

mymodel <- glm(y ~ x1 + x2 + x3, family=binomial(link="probit"), data=mydata)

我不太确定为什么会得到不同的结果，并且非常感谢您的解释。

编辑：

这是我的数据：

2.66  20  0  0
2.89  22  0  0
3.28  24  0  0
2.92  12  0  0
4.00  21  0  1
2.86  17  0  0
2.76  17  0  0
2.87  21  0  0
3.03  25  0  0
3.92  29  0  1
2.63  20  0  0
3.32  23  0  0
3.57  23  0  0
3.26  25  0  1
3.53  26  0  0
2.74  19  0  0
2.75  25  0  0
2.83  19  0  0
3.12  23  1  0
3.16  25  1  1
2.06  22  1  0
3.62  28  1  1
2.89  14  1  0
3.51  26  1  0
3.54  24  1  1
2.83  27  1  1
3.39  17  1  1
2.67  24  1  0
3.65  21  1  1
4.00  23  1  1
3.1   21  1  0
2.39  19  1  1

这是我的估计系数（括号中的标准误差）：

SAS: -7.452320 (2.542536)
      1.625810 (0.693869)
      0.051729 (0.083891)
      1.426332 (0.595036)
R:   -7.25319  (2.50977)
      1.64888  (0.69427)
      0.03989  (0.07961)
      1.42490  (0.58347)

原文

I'm trying to replicate my SAS work in R, but I get slightly different results -- differences that can't be explained by rounding error.

Here's my SAS code:

proc qlim data=mydata;
   model y = x1 x2 x3/ discrete(d=probit);
   output out=outdata marginal;
   title "just ran QLIM model";
run;
quit;

And here's my R code:

mymodel <- glm(y ~ x1 + x2 + x3, family=binomial(link="probit"), data=mydata)

I'm not really sure why I'd get different results, and would greatly appreciate an explanation.

EDIT:

Here's my data:

2.66  20  0  0
2.89  22  0  0
3.28  24  0  0
2.92  12  0  0
4.00  21  0  1
2.86  17  0  0
2.76  17  0  0
2.87  21  0  0
3.03  25  0  0
3.92  29  0  1
2.63  20  0  0
3.32  23  0  0
3.57  23  0  0
3.26  25  0  1
3.53  26  0  0
2.74  19  0  0
2.75  25  0  0
2.83  19  0  0
3.12  23  1  0
3.16  25  1  1
2.06  22  1  0
3.62  28  1  1
2.89  14  1  0
3.51  26  1  0
3.54  24  1  1
2.83  27  1  1
3.39  17  1  1
2.67  24  1  0
3.65  21  1  1
4.00  23  1  1
3.1   21  1  0
2.39  19  1  1

And here are my estimated coefficients (std errors in parentheses):

SAS: -7.452320 (2.542536)
      1.625810 (0.693869)
      0.051729 (0.083891)
      1.426332 (0.595036)
R:   -7.25319  (2.50977)
      1.64888  (0.69427)
      0.03989  (0.07961)
      1.42490  (0.58347)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

撞了怀 2024-09-19 04:40:12

它可能位于默认使用的对比矩阵中。 R 使用处理对比，而 SAS 使用它自己的处理对比。在帮助中查找对比和对比 SAS。如果您经常使用 SAS 对比，您可能只想将选项设置为该值。

options(contrasts=c("contr.SAS", "contr.poly"))

要了解这如何影响事物，请观察治疗和 SAS 对比矩阵的差异

contr.treatment(4)
  2 3 4
1 0 0 0
2 1 0 0
3 0 1 0
4 0 0 1

contr.SAS(4)
  1 2 3
1 1 0 0
2 0 1 0
3 0 0 1
4 0 0 0

It is possibly in the contrast matrix used by default. R uses treatment contrasts while SAS uses it's own. Look up contrasts and contr.SAS in the help. If you're using SAS contrasts a lot you might want to just set the options to that.

options(contrasts=c("contr.SAS", "contr.poly"))

To get an idea how this affects things observe the difference in treatment and SAS contrast matrices

contr.treatment(4)
  2 3 4
1 0 0 0
2 1 0 0
3 0 1 0
4 0 0 1

contr.SAS(4)
  1 2 3
1 1 0 0
2 0 1 0
3 0 0 1
4 0 0 0

回复收藏 0 原文

我恋#小黄人 2024-09-19 04:40:12

当我在 R 中使用您的数据和代码运行它时，我得到的答案（接近）您为 SAS 结果显示的结果：

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept) -7.45231    2.57152  -2.898  0.00376 **
x1           1.62581    0.68973   2.357  0.01841 * 
x2           0.05173    0.08119   0.637  0.52406   
x3           1.42633    0.58695   2.430  0.01510 *

标准误差偏离了几个百分点，但这并不令人惊讶。

我还在glmmADMB（R-forge 上提供）中运行了它，这是一个完全不同的实现，并且得到的估计值与 SAS 稍远，但标准误差更接近——比最初的差异小得多无论如何都要报告。

library(glmmADMB)
> mm2 <- glmmadmb(y~x1+x2+x3,family="binomial",link="probit",data=mydata)
["estimated covariance may be non-positive-definite warnings"]
> summary(mm2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -7.4519     2.5424   -2.93   0.0034 **
x1            1.6258     0.6939    2.34   0.0191 * 
x2            0.0517     0.0839    0.62   0.5375   
x3            1.4263     0.5950    2.40   0.0165 *

您使用的是哪个版本的 R？（尽管 glm 是非常稳定的代码，但版本之间可能发生了一些变化......）您确定没有搞砸什么吗？

> sessionInfo()
R Under development (unstable) (2011-10-06 r57181)
Platform: i686-pc-linux-gnu (32-bit)

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] glmmADMB_0.6.4

When I run it in R with your data and code I get answers (close to) what you show for the SAS results:

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept) -7.45231    2.57152  -2.898  0.00376 **
x1           1.62581    0.68973   2.357  0.01841 * 
x2           0.05173    0.08119   0.637  0.52406   
x3           1.42633    0.58695   2.430  0.01510 *

The standard errors are off by a few percent, but that's less surprising.

I also ran it in glmmADMB (available on R-forge), which is a completely different implementation, and got estimates slightly farther from, but standard errors closer to, SAS -- much smaller differences than you originally reported in any case.

library(glmmADMB)
> mm2 <- glmmadmb(y~x1+x2+x3,family="binomial",link="probit",data=mydata)
["estimated covariance may be non-positive-definite warnings"]
> summary(mm2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -7.4519     2.5424   -2.93   0.0034 **
x1            1.6258     0.6939    2.34   0.0191 * 
x2            0.0517     0.0839    0.62   0.5375   
x3            1.4263     0.5950    2.40   0.0165 *

What version of R were you using? (It's possible that something changed between versions, although glm is very stable code ...) Are you sure you didn't mess something up?

> sessionInfo()
R Under development (unstable) (2011-10-06 r57181)
Platform: i686-pc-linux-gnu (32-bit)

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] glmmADMB_0.6.4

回复收藏 0 原文

左耳近心 2024-09-19 04:40:12

您应该比较哪个软件报告的对数似然最高。这些数字可能只是因为两种算法中的终止标准不同而不同。例如，大多数算法使用梯度范数作为停止规则（即：小于 0.0005 时），但每个软件都使用自己的规范。
根据停止的位置，这些估计的方差将明显不同，因为它们是通过反转 Hessian 矩阵（在停止的位置评估）获得的。
为了 100% 确定，您可以使用报告最高对数似然的 R 或 SAS 值进行检查。或者您可以使用这些值手动计算对数似然。
如果有人要求您在 R 和 SAS 中报告完全相同的值，只需触及两种算法的收敛标准即可。设置一些非常严格的参数<0.00000005，在这两种情况下，两个程序都应该报告相同的值。

（好吧，除非你的可能性有多个最大值，这似乎不是这里的问题；在这种情况下，最终的估计将取决于你的初始值）

回复收藏 0 原文

挽梦忆笙歌 2024-09-19 04:40:12

我是 R 新手，但我有一个建议。

尝试使用另一个 R 包运行概率...尝试 Zelig。

mymodel <- zelig(y ~ x1 + x2 + x3, model="probit", data=mydata)
summary(mymodel)

该模型中的回归系数是否不同？

I'm an R newbie, but I have a suggestion.

Try running the probit using another R package...try Zelig.

mymodel <- zelig(y ~ x1 + x2 + x3, model="probit", data=mydata)
summary(mymodel)

Are the regression coefficients different in this model?

回复收藏 0 原文

~没有更多了~

关于作者

许仙没带伞

暂无简介

0 文章

0 评论

25 人气

关注发私信

友情链接

文江博客

在 SAS 和 R 中复制概率回归

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

我早已燃尽

就像说晚安

donghfcn

脱单之前绝不改名′

凡尘雨

鲜血染红嫁衣

友情链接

在 SAS 和 R 中复制概率回归

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

我早已燃尽

就像说晚安

donghfcn

脱单之前绝不改名′

凡尘雨

鲜血染红嫁衣

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。