glm.nb非连贯和其他负二项式回归的选项

发布于 2025-02-02 20:55:11 字数 11821 浏览 3 评论 0原文

我正在尝试在R中实现具有负二项式分布的GLM,并有一些问题。以下是我使用的数据 - 请注意,我的预测因子都使用2个标准偏差将全部居中和缩放。

df <- structure(list(offset_var = c(1.28599085563929, 0.56011798945881, 
0.910539373132146, 1.40150012655384, 1.99332290786881, 0.431352036569322, 
1.9664387276399, 0.39501204122542, 0.837952086514098, 1.49309174747163, 
0.185176379335315, 0.340780160418832, 0.23261768703236, 1.9358463846208, 
0.683228994093252, 0.766384544663353, 2.36951602293399, 0.570222733062088, 
2.035225147398, 2.66190465894079, 2.59441609615926, 2.45851192989865, 
9.10131475225225, 1.79085721952421, 1.14341199635633, 1.86464965904907, 
0.969684569635742, 1.48586083003075, 0.398395739771472, 1.57819408350658, 
0.348242838034098, 1.01779798183851, 1.56020949397414, 1.14211413937976, 
1.15472189286642, 2.17372502761175, 2.17372502761175, 1.15472189286642, 
0.91480376034087, 0.52428786649856, 0.663761139444733, 1.85426680323653, 
0.799109081286816, 1.2592920835499, 1.32010595330908, 1.9473416892704, 
1.74079702185659, 1.92731761020335), response_var = c(31, 154, 
40, 71, 3, 43, 66, 180, 62, 27, 84, 81, 66, 14, 40, 35, 48, 155, 
23, 37, 33, 88, 38, 8, 73, 29, 11, 12, 74, 70, 298, 40, 24, 35, 
0, 27, 0, 116, 83, 60, 77, 68, 56, 8, 64, 234, 57, 71), pred1 = c(0.593253968253968, 
0.212619047619048, 0.208888888888889, 0.824603174603175, 0.791031746031746, 
0.096984126984127, 0.589523809523809, 0.096984126984127, 0.783571428571429, 
0.156666666666667, 0, 0.361904761904762, 0.164126984126984, 0.723809523809524, 
0.123095238095238, 0.671587301587302, 0.708888888888889, 0.186507936507937, 
0.77984126984127, 0.0820634920634921, 0.220079365079365, 1, 0.27984126984127, 
0.835793650793651, 0.526031746031746, 0.0298412698412698, 0.641746031746032, 
0.0783333333333333, 0.156666666666667, 0.791031746031746, 0.167857142857143, 
0.641746031746032, 0.817142857142857, 0.126825396825397, 0.160396825396825, 
0.776031746031746, 0.776031746031746, 0.160396825396825, 0.130555555555556, 
0.600714285714286, 0.123095238095238, 0.70515873015873, 0.164126984126984, 
0.492460317460317, 0.0932539682539683, 0.596984126984127, 0.134285714285714, 
0.899206349206349), pred3 = c(0.779445727482678, 0.779445727482678, 
0.981524249422633, 0.981524249422633, 0.431293302540416, 0.431293302540416, 
0.85796766743649, 0.85796766743649, 0.974018475750578, 0.974018475750578, 
0.215357967667437, 0.215357967667437, 0.767898383371826, 0.767898383371826, 
0.929561200923788, 0.929561200923788, 0.319861431870669, 0.319861431870669, 
0.930138568129331, 0.930138568129331, 1, 1, 0.753464203233256, 
0.753464203233256, 0.937644341801385, 0.937644341801385, 0.438799076212472, 
0.438799076212472, 0.960161662817552, 0.960161662817552, 0.852193995381064, 
0.852193995381064, 0.803117782909932, 0.803117782909932, 0.571593533487299, 
0.571593533487299, 0.571593533487299, 0.571593533487299, 0, 0, 
0.801385681293304, 0.801385681293304, 0.656466512702079, 0.656466512702079, 
0.852771362586606, 0.852771362586606, 0.203810623556583, 0.203810623556583
), pred6 = c(0.688053824835413, 0.540952466204905, 0.628399305213738, 
0.6881894451542, 0.780506864063453, 0.596192469087064, 0.664351792193705, 
0.601630037911551, 0.725643181043039, 0.617634868802872, 0.811784880984014, 
0.804339479465123, 0.576349468957153, 0.847678636187369, 0.625702671539389, 
0.719554647550091, 0.363763777714112, 0.446477292582477, 0.474674252997289, 
0.480035185899025, 0.4937661725622, 0, 0.513379111687292, 0.903418101749096, 
0.695930304479286, 0.617335579129072, 0.759590008590574, 0.654198263624173, 
0.638351745840282, 0.758157124611394, 0.616963963666818, 0.749652691590418, 
0.626922408721354, 0.587580912470206, 0.430636110291636, 0.349235608213707, 
0.349235608213707, 0.430636110291636, 0.723455668440276, 1, 0.615287637567703, 
0.696660949537828, 0.570881476208878, 0.51486263616898, 0.599650096633513, 
0.661291018750272, 0.44995045321675, 0.317335721700076), pred2 = c(0.0640028225107189, 
0.0640028225107189, 1, 1, 0.0467149443570715, 0.0467149443570715, 
0.955928169494161, 0.955928169494161, 0.991149754525286, 0.991149754525286, 
0.636541945977623, 0.636541945977623, 0.0695282460368243, 0.0695282460368243, 
0.945044759518499, 0.945044759518499, 0.666620820800469, 0.666620820800469, 
0.865751343981534, 0.865751343981534, 0.158030700783964, 0.158030700783964, 
0, 0, 0.941696017987526, 0.941696017987526, 0.194526003575978, 
0.194526003575978, 0.974286448958601, 0.974286448958601, 0.182153599598151, 
0.182153599598151, 0.834117696305023, 0.834117696305023, 0.729828317197582, 
0.729828317197582, 0.729828317197582, 0.729828317197582, 0.477488683047594, 
0.477488683047594, 0.914726688872012, 0.914726688872012, 0.0551346373492319, 
0.0551346373492319, 0.950905057197701, 0.950905057197701, 0.653584648412039, 
0.653584648412039), pred4 = c(0.521197167238095, 0.521197167238095, 
0.675165075565519, 0.675165075565519, 0.754750741994489, 0.754750741994489, 
0.656214157700682, 0.656214157700682, 0.779025156922262, 0.779025156922262, 
0.466656174770789, 0.466656174770789, 1, 1, 0.19412522736884, 
0.19412522736884, 0, 0, 0.914127451104004, 0.914127451104004, 
0.450341983837599, 0.450341983837599, 0.200288723846985, 0.200288723846985, 
0.51864603534987, 0.51864603534987, 0.632983879774839, 0.632983879774839, 
0.219950187016786, 0.219950187016786, 0.378371968866928, 0.378371968866928, 
0.332552996854781, 0.332552996854781, 0.543573394076179, 0.543573394076179, 
0.543573394076179, 0.543573394076179, 0.110505257947263, 0.110505257947263, 
0.80446053190608, 0.80446053190608, 0.942739062050578, 0.942739062050578, 
0.769130578963835, 0.769130578963835, 0.910376608809312, 0.910376608809312
), pred5 = c(1, 1, 0.277346201087263, 0.277346201087263, 0.977777461589862, 
0.977777461589862, 0.690563309054654, 0.690563309054654, 0.439008770827643, 
0.439008770827643, 0.819995545023047, 0.819995545023047, 0.579039846892373, 
0.579039846892373, 0.365538938318013, 0.365538938318013, 0.0971685384896833, 
0.0971685384896833, 0.599718772091419, 0.599718772091419, 0.871837464542416, 
0.871837464542416, 0.0550210151699806, 0.0550210151699806, 0.740706048133834, 
0.740706048133834, 0.0826796777514591, 0.0826796777514591, 0.52020534667614, 
0.52020534667614, 0, 0, 0.660904950317592, 0.660904950317592, 
0.790027991312289, 0.790027991312289, 0.790027991312289, 0.790027991312289, 
0.206354045553633, 0.206354045553633, 0.600235537123976, 0.600235537123976, 
0.829243121962118, 0.829243121962118, 0.595320097430252, 0.595320097430252, 
0.17138558268267, 0.17138558268267)), row.names = c(NA, -48L), class = "data.frame")

首先,我尝试使用glm.nb函数在mass软件包中运行该模型,但一直在获取非连面警告(glm.fit:算法:算法确实即使将迭代次数增加到默认25(我尝试了50、100、250、1000,甚至5000)之后,也不会收敛):

> glm1 <- glm.nb(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, control = glm.control(maxit = 5000))
There were 50 or more warnings (use warnings() to see the first 50) #these are the non-convergence warnings
> summary(glm1)

Call:
glm.nb(formula = response_var ~ pred1 * pred2 + pred1 * pred3 + 
    pred1 * pred4 + pred1 * pred5 + pred6 + offset(log(offset_var)), 
    data = df, control = glm.control(maxit = 5000), init.theta = 0.6779942761, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8944  -0.9971  -0.4541   0.2178   1.9029  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   7.3937     1.2057   6.132 8.65e-10 ***
pred1        -5.2640     2.0354  -2.586   0.0097 ** 
pred2        -0.9040     0.8356  -1.082   0.2793    
pred3        -0.6444     1.0470  -0.616   0.5382    
pred4        -1.1811     1.1321  -1.043   0.2968    
pred5        -0.2583     1.0225  -0.253   0.8006    
pred6        -0.6567     1.0222  -0.642   0.5206    
pred1:pred2   2.0800     1.5741   1.321   0.1864    
pred1:pred3   1.7663     2.0589   0.858   0.3910    
pred1:pred4   0.6586     2.1055   0.313   0.7544    
pred1:pred5   0.4524     2.0225   0.224   0.8230    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.791) family taken to be 1)

    Null deviance: 68.201  on 47  degrees of freedom
Residual deviance: 57.762  on 37  degrees of freedom
AIC: 541.63

Number of Fisher Scoring iterations: 5000


              Theta:  0.678 
          Std. Err.:  0.124 
Warning while fitting theta: alternation limit reached 

 2 x log-likelihood:  -517.634 

我知道我试图在样本尺寸的情况下适合一个复杂的模型,所以我假设这就是为什么模型没有收敛的原因(我想适合如此复杂的模型,因为我将使用它作为执行模型平均的全局模型)。但是,然后我尝试了可以​​适合R中具有负二项式分布的GLM的替代软件包。结果:

library(glmmTMB)
library(brglm2)

> #fit model using glmmTMB
> glm2 <- glmmTMB(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, family = nbinom2)
> summary(glm2)
 Family: nbinom2  ( log )
Formula:          response_var ~ pred1 * pred2 + pred1 * pred3 + pred1 * pred4 +      pred1 * pred5 + pred6 + offset(log(offset_var))
Data: df

     AIC      BIC   logLik deviance df.resid 
   538.8    561.3   -257.4    514.8       36 


Dispersion parameter for nbinom2 family (): 0.815 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.09772    1.36899   5.185 2.16e-07 ***
pred1       -5.42650    2.23604  -2.427   0.0152 *  
pred2       -1.10202    0.79743  -1.382   0.1670    
pred3       -0.39245    1.14313  -0.343   0.7314    
pred4       -1.33843    1.32065  -1.013   0.3108    
pred5       -0.07125    1.09954  -0.065   0.9483    
pred6       -0.17567    1.03227  -0.170   0.8649    
pred1:pred2  2.80613    1.69120   1.659   0.0971 .  
pred1:pred3  1.04996    2.04619   0.513   0.6079    
pred1:pred4  1.15544    2.37031   0.487   0.6259    
pred1:pred5 -0.10083    2.31516  -0.044   0.9653    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> #fit model using brglm2
> glm_3 <- brnb(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, link = "log", type = "ML", transformation = "identity")
> summary(glm_3)

Call:
brnb(formula = response_var ~ pred1 * pred2 + pred1 * pred3 + pred1 * pred4 + pred1 * pred5 + pred6 + offset(log(offset_var)), data = df, link = "log", type = "ML", 
    transformation = "identity")


Coefficients (mean model with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.09773    1.18882   5.970 2.37e-09 ***
pred1       -5.42651    2.00939  -2.701  0.00692 ** 
pred2       -1.10203    0.82354  -1.338  0.18084    
pred3       -0.39243    1.03189  -0.380  0.70372    
pred4       -1.33854    1.11593  -1.199  0.23034    
pred5       -0.07116    1.00813  -0.071  0.94373    
pred6       -0.17567    1.00911  -0.174  0.86180    
pred1:pred2  2.80608    1.55187   1.808  0.07058 .  
pred1:pred3  1.05001    2.03198   0.517  0.60534    
pred1:pred4  1.15563    2.07750   0.556  0.57803    
pred1:pred5 -0.10104    1.99736  -0.051  0.95965    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter  with identity transformation function: 1.2267971)

    Null deviance: 81.222  on 47  degrees of freedom
Residual deviance: 57.153  on 37  degrees of freedom
AIC: 538.825

Type of estimator: ML (maximum likelihood)
Number of quasi-Fisher scoring iterations:32

我的问题是:

  1. 为什么模型不使用glm.nb而不是收敛而是使用其他两个软件包收敛?我是否应该怀疑glmmtmbbrglm2给定使用glm.nb的结果?
  2. glmmtmb是迄今为止最快的,所以我想使用该软件包。但是我想知道:鉴于glmmtmb是用于通用线性混合建模的软件包,在glmmtmb中拟合模型是否有任何问题?还是唯一的权衡是使用glmmtmbbrglm2相比,使用glmmtmb的系数略高的标准错误?

I'm trying to implement a glm with a negative binomial distribution in R and have a few questions. Here are the data I'm working with - note that my predictors are all centred and scaled using 2 standard deviations.

df <- structure(list(offset_var = c(1.28599085563929, 0.56011798945881, 
0.910539373132146, 1.40150012655384, 1.99332290786881, 0.431352036569322, 
1.9664387276399, 0.39501204122542, 0.837952086514098, 1.49309174747163, 
0.185176379335315, 0.340780160418832, 0.23261768703236, 1.9358463846208, 
0.683228994093252, 0.766384544663353, 2.36951602293399, 0.570222733062088, 
2.035225147398, 2.66190465894079, 2.59441609615926, 2.45851192989865, 
9.10131475225225, 1.79085721952421, 1.14341199635633, 1.86464965904907, 
0.969684569635742, 1.48586083003075, 0.398395739771472, 1.57819408350658, 
0.348242838034098, 1.01779798183851, 1.56020949397414, 1.14211413937976, 
1.15472189286642, 2.17372502761175, 2.17372502761175, 1.15472189286642, 
0.91480376034087, 0.52428786649856, 0.663761139444733, 1.85426680323653, 
0.799109081286816, 1.2592920835499, 1.32010595330908, 1.9473416892704, 
1.74079702185659, 1.92731761020335), response_var = c(31, 154, 
40, 71, 3, 43, 66, 180, 62, 27, 84, 81, 66, 14, 40, 35, 48, 155, 
23, 37, 33, 88, 38, 8, 73, 29, 11, 12, 74, 70, 298, 40, 24, 35, 
0, 27, 0, 116, 83, 60, 77, 68, 56, 8, 64, 234, 57, 71), pred1 = c(0.593253968253968, 
0.212619047619048, 0.208888888888889, 0.824603174603175, 0.791031746031746, 
0.096984126984127, 0.589523809523809, 0.096984126984127, 0.783571428571429, 
0.156666666666667, 0, 0.361904761904762, 0.164126984126984, 0.723809523809524, 
0.123095238095238, 0.671587301587302, 0.708888888888889, 0.186507936507937, 
0.77984126984127, 0.0820634920634921, 0.220079365079365, 1, 0.27984126984127, 
0.835793650793651, 0.526031746031746, 0.0298412698412698, 0.641746031746032, 
0.0783333333333333, 0.156666666666667, 0.791031746031746, 0.167857142857143, 
0.641746031746032, 0.817142857142857, 0.126825396825397, 0.160396825396825, 
0.776031746031746, 0.776031746031746, 0.160396825396825, 0.130555555555556, 
0.600714285714286, 0.123095238095238, 0.70515873015873, 0.164126984126984, 
0.492460317460317, 0.0932539682539683, 0.596984126984127, 0.134285714285714, 
0.899206349206349), pred3 = c(0.779445727482678, 0.779445727482678, 
0.981524249422633, 0.981524249422633, 0.431293302540416, 0.431293302540416, 
0.85796766743649, 0.85796766743649, 0.974018475750578, 0.974018475750578, 
0.215357967667437, 0.215357967667437, 0.767898383371826, 0.767898383371826, 
0.929561200923788, 0.929561200923788, 0.319861431870669, 0.319861431870669, 
0.930138568129331, 0.930138568129331, 1, 1, 0.753464203233256, 
0.753464203233256, 0.937644341801385, 0.937644341801385, 0.438799076212472, 
0.438799076212472, 0.960161662817552, 0.960161662817552, 0.852193995381064, 
0.852193995381064, 0.803117782909932, 0.803117782909932, 0.571593533487299, 
0.571593533487299, 0.571593533487299, 0.571593533487299, 0, 0, 
0.801385681293304, 0.801385681293304, 0.656466512702079, 0.656466512702079, 
0.852771362586606, 0.852771362586606, 0.203810623556583, 0.203810623556583
), pred6 = c(0.688053824835413, 0.540952466204905, 0.628399305213738, 
0.6881894451542, 0.780506864063453, 0.596192469087064, 0.664351792193705, 
0.601630037911551, 0.725643181043039, 0.617634868802872, 0.811784880984014, 
0.804339479465123, 0.576349468957153, 0.847678636187369, 0.625702671539389, 
0.719554647550091, 0.363763777714112, 0.446477292582477, 0.474674252997289, 
0.480035185899025, 0.4937661725622, 0, 0.513379111687292, 0.903418101749096, 
0.695930304479286, 0.617335579129072, 0.759590008590574, 0.654198263624173, 
0.638351745840282, 0.758157124611394, 0.616963963666818, 0.749652691590418, 
0.626922408721354, 0.587580912470206, 0.430636110291636, 0.349235608213707, 
0.349235608213707, 0.430636110291636, 0.723455668440276, 1, 0.615287637567703, 
0.696660949537828, 0.570881476208878, 0.51486263616898, 0.599650096633513, 
0.661291018750272, 0.44995045321675, 0.317335721700076), pred2 = c(0.0640028225107189, 
0.0640028225107189, 1, 1, 0.0467149443570715, 0.0467149443570715, 
0.955928169494161, 0.955928169494161, 0.991149754525286, 0.991149754525286, 
0.636541945977623, 0.636541945977623, 0.0695282460368243, 0.0695282460368243, 
0.945044759518499, 0.945044759518499, 0.666620820800469, 0.666620820800469, 
0.865751343981534, 0.865751343981534, 0.158030700783964, 0.158030700783964, 
0, 0, 0.941696017987526, 0.941696017987526, 0.194526003575978, 
0.194526003575978, 0.974286448958601, 0.974286448958601, 0.182153599598151, 
0.182153599598151, 0.834117696305023, 0.834117696305023, 0.729828317197582, 
0.729828317197582, 0.729828317197582, 0.729828317197582, 0.477488683047594, 
0.477488683047594, 0.914726688872012, 0.914726688872012, 0.0551346373492319, 
0.0551346373492319, 0.950905057197701, 0.950905057197701, 0.653584648412039, 
0.653584648412039), pred4 = c(0.521197167238095, 0.521197167238095, 
0.675165075565519, 0.675165075565519, 0.754750741994489, 0.754750741994489, 
0.656214157700682, 0.656214157700682, 0.779025156922262, 0.779025156922262, 
0.466656174770789, 0.466656174770789, 1, 1, 0.19412522736884, 
0.19412522736884, 0, 0, 0.914127451104004, 0.914127451104004, 
0.450341983837599, 0.450341983837599, 0.200288723846985, 0.200288723846985, 
0.51864603534987, 0.51864603534987, 0.632983879774839, 0.632983879774839, 
0.219950187016786, 0.219950187016786, 0.378371968866928, 0.378371968866928, 
0.332552996854781, 0.332552996854781, 0.543573394076179, 0.543573394076179, 
0.543573394076179, 0.543573394076179, 0.110505257947263, 0.110505257947263, 
0.80446053190608, 0.80446053190608, 0.942739062050578, 0.942739062050578, 
0.769130578963835, 0.769130578963835, 0.910376608809312, 0.910376608809312
), pred5 = c(1, 1, 0.277346201087263, 0.277346201087263, 0.977777461589862, 
0.977777461589862, 0.690563309054654, 0.690563309054654, 0.439008770827643, 
0.439008770827643, 0.819995545023047, 0.819995545023047, 0.579039846892373, 
0.579039846892373, 0.365538938318013, 0.365538938318013, 0.0971685384896833, 
0.0971685384896833, 0.599718772091419, 0.599718772091419, 0.871837464542416, 
0.871837464542416, 0.0550210151699806, 0.0550210151699806, 0.740706048133834, 
0.740706048133834, 0.0826796777514591, 0.0826796777514591, 0.52020534667614, 
0.52020534667614, 0, 0, 0.660904950317592, 0.660904950317592, 
0.790027991312289, 0.790027991312289, 0.790027991312289, 0.790027991312289, 
0.206354045553633, 0.206354045553633, 0.600235537123976, 0.600235537123976, 
0.829243121962118, 0.829243121962118, 0.595320097430252, 0.595320097430252, 
0.17138558268267, 0.17138558268267)), row.names = c(NA, -48L), class = "data.frame")

First off, I tried running the model using the glm.nb function in the MASS package, but kept getting non-convergence warnings (glm.fit: algorithm did not converge) even after increasing the number of iterations beyond the default 25 (I tried 50, 100, 250, 1000, and even 5000):

> glm1 <- glm.nb(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, control = glm.control(maxit = 5000))
There were 50 or more warnings (use warnings() to see the first 50) #these are the non-convergence warnings
> summary(glm1)

Call:
glm.nb(formula = response_var ~ pred1 * pred2 + pred1 * pred3 + 
    pred1 * pred4 + pred1 * pred5 + pred6 + offset(log(offset_var)), 
    data = df, control = glm.control(maxit = 5000), init.theta = 0.6779942761, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8944  -0.9971  -0.4541   0.2178   1.9029  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   7.3937     1.2057   6.132 8.65e-10 ***
pred1        -5.2640     2.0354  -2.586   0.0097 ** 
pred2        -0.9040     0.8356  -1.082   0.2793    
pred3        -0.6444     1.0470  -0.616   0.5382    
pred4        -1.1811     1.1321  -1.043   0.2968    
pred5        -0.2583     1.0225  -0.253   0.8006    
pred6        -0.6567     1.0222  -0.642   0.5206    
pred1:pred2   2.0800     1.5741   1.321   0.1864    
pred1:pred3   1.7663     2.0589   0.858   0.3910    
pred1:pred4   0.6586     2.1055   0.313   0.7544    
pred1:pred5   0.4524     2.0225   0.224   0.8230    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.791) family taken to be 1)

    Null deviance: 68.201  on 47  degrees of freedom
Residual deviance: 57.762  on 37  degrees of freedom
AIC: 541.63

Number of Fisher Scoring iterations: 5000


              Theta:  0.678 
          Std. Err.:  0.124 
Warning while fitting theta: alternation limit reached 

 2 x log-likelihood:  -517.634 

I know I am trying to fit a complex model given my sample size, so I am assuming this is why the model is not converging (I want to fit such a complex model because I am going to use it as the global model upon which to perform model averaging). However, I then tried alternative packages that can fit glms with negative binomial distributions in R. When I tried glmmTMB and brglm2the models converged and I got the following (very similar) results:

library(glmmTMB)
library(brglm2)

> #fit model using glmmTMB
> glm2 <- glmmTMB(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, family = nbinom2)
> summary(glm2)
 Family: nbinom2  ( log )
Formula:          response_var ~ pred1 * pred2 + pred1 * pred3 + pred1 * pred4 +      pred1 * pred5 + pred6 + offset(log(offset_var))
Data: df

     AIC      BIC   logLik deviance df.resid 
   538.8    561.3   -257.4    514.8       36 


Dispersion parameter for nbinom2 family (): 0.815 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.09772    1.36899   5.185 2.16e-07 ***
pred1       -5.42650    2.23604  -2.427   0.0152 *  
pred2       -1.10202    0.79743  -1.382   0.1670    
pred3       -0.39245    1.14313  -0.343   0.7314    
pred4       -1.33843    1.32065  -1.013   0.3108    
pred5       -0.07125    1.09954  -0.065   0.9483    
pred6       -0.17567    1.03227  -0.170   0.8649    
pred1:pred2  2.80613    1.69120   1.659   0.0971 .  
pred1:pred3  1.04996    2.04619   0.513   0.6079    
pred1:pred4  1.15544    2.37031   0.487   0.6259    
pred1:pred5 -0.10083    2.31516  -0.044   0.9653    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> #fit model using brglm2
> glm_3 <- brnb(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, link = "log", type = "ML", transformation = "identity")
> summary(glm_3)

Call:
brnb(formula = response_var ~ pred1 * pred2 + pred1 * pred3 + pred1 * pred4 + pred1 * pred5 + pred6 + offset(log(offset_var)), data = df, link = "log", type = "ML", 
    transformation = "identity")


Coefficients (mean model with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.09773    1.18882   5.970 2.37e-09 ***
pred1       -5.42651    2.00939  -2.701  0.00692 ** 
pred2       -1.10203    0.82354  -1.338  0.18084    
pred3       -0.39243    1.03189  -0.380  0.70372    
pred4       -1.33854    1.11593  -1.199  0.23034    
pred5       -0.07116    1.00813  -0.071  0.94373    
pred6       -0.17567    1.00911  -0.174  0.86180    
pred1:pred2  2.80608    1.55187   1.808  0.07058 .  
pred1:pred3  1.05001    2.03198   0.517  0.60534    
pred1:pred4  1.15563    2.07750   0.556  0.57803    
pred1:pred5 -0.10104    1.99736  -0.051  0.95965    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter  with identity transformation function: 1.2267971)

    Null deviance: 81.222  on 47  degrees of freedom
Residual deviance: 57.153  on 37  degrees of freedom
AIC: 538.825

Type of estimator: ML (maximum likelihood)
Number of quasi-Fisher scoring iterations:32

My questions are:

  1. Why would the model not converge using glm.nb but converge using the other two packages? Should I be suspicious of the results from glmmTMB or brglm2 given the issues with convergence in using glm.nb?
  2. glmmTMB was by far the fastest so I'd like to use that package going forward. However I am wondering: given that glmmTMB is a package for generalized linear mixed modelling, is there any issue with fitting a model in glmmTMB without random effects? Or would the only tradeoff be the slightly higher standard errors for the coefficients fit using glmmTMB compared with those from brglm2?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

愿得七秒忆 2025-02-09 20:55:11

使用GLM.NB()生成的参数估计值和相关统计量应与仅使用相同名称包装的GLMMTMB()函数获得的glmmtmb()函数相同,仅使用相同的固定效果。但是,我只是遇到了两个软件包之间模型系数和AIC值不同的情况。通常他们同意。

GLMMTMB的一件非常酷的事情是,它允许您考虑泊松家族的替代扩展,例如负二项式I型(NB1,family = nbinom1),平均参数化的Conway-Maxwell-Poisson(family = Compois)和广义泊松(family = genpois)。您可能需要查看这些其他错误结构,以建模您的(我认为)计数数据。

另外,您是否尝试使用偏移,而是按原样使用此协变量,然后为您的预测选择该协变量范围的恒定值(例如,平均值)?

祝你好运。

The parameter estimates and related statistics that are produced with glm.nb() should be the same as the one you get from using the glmmTMB() function of the package of the same name using only the same fixed effects. However, I have just came across a situation where both the model coefficients and AIC values differed between the two packages. Normally they agree.

A pretty cool thing about glmmTMB is that it allows you to consider alternative extensions of the Poisson family, such as the negative binomial type I (NB1, family=nbinom1), the mean-parameterized Conway-Maxwell-Poisson (family=compois) and the Generalized Poisson (family=genpois). You may want to have a look at these additional error structures to model your (I presume) count data.

Also, have you tried to not use an offset, but rather use this covariate as is and then, for your predictions, choose a constant value (e.g., mean) in the range of values for this covariate?

Good luck.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文