缺少混合模型和GGEFFECT的标准错误和信心

发布于 2025-01-20 17:38:02 字数 3590 浏览 3 评论 0原文

我正在尝试使用ggeffects :: ggpredict为我的模型制作一些效果图。我发现许多结果都缺少标准错误和置信度限制。我可以通过一些模拟数据来重现问题。似乎是专门用于观察到标准错误将预测概率接近0或1的观察结果。

我试图在链接刻度上获得预测,以诊断是否是从链接到响应的翻译问题的问题,但我不相信这是由包裹支持。

有什么想法解决这个问题吗?非常感谢。

library(tidyverse)
library(lme4)
library(ggeffects)

# number of simulated observations
n <- 1000

# simulated data with a numerical predictor x, factor predictor f, response y
# the simulated effects of x and f are somewhat weak compared to the noise, so expect high standard errors
df <- tibble(
  x = seq(-0.1, 0.1, length.out = n),
  g = floor(runif(n) * 3),
  f = letters[1 + g] %>% as.factor(),
  y = pracma::sigmoid(x + (runif(n) - 0.5) + 0.1 * (g - mean(g))),
  z = if_else(y > 0.5, "high", "low") %>% as.factor()
)

# glmer model
model <- glmer(z ~ x + (1 | f), data = df, family = binomial)
print(summary(model))
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: binomial  ( logit )
#> Formula: z ~ x + (1 | f)
#>    Data: df
#> 
#>      AIC      BIC   logLik deviance df.resid 
#>   1373.0   1387.8   -683.5   1367.0      997 
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -1.3858 -0.9928  0.7317  0.9534  1.3600 
#> 
#> Random effects:
#>  Groups Name        Variance Std.Dev.
#>  f      (Intercept) 0.0337   0.1836  
#> Number of obs: 1000, groups:  f, 3
#> 
#> Fixed effects:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.02737    0.12380   0.221    0.825    
#> x           -4.48012    1.12066  -3.998 6.39e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>   (Intr)
#> x -0.001

# missing standard errors
ggpredict(model, c("x", "f")) %>% print()
#> Data were 'prettified'. Consider using `terms="x [all]"` to get smooth plots.
#> # Predicted probabilities of z
#> 
#> # f = a
#> 
#>     x | Predicted |       95% CI
#> --------------------------------
#> -0.10 |      0.62 | [0.54, 0.69]
#>  0.00 |      0.51 |             
#>  0.10 |      0.40 |             
#> 
#> # f = b
#> 
#>     x | Predicted |       95% CI
#> --------------------------------
#> -0.10 |      0.62 | [0.56, 0.67]
#>  0.00 |      0.51 |             
#>  0.10 |      0.40 |             
#> 
#> # f = c
#> 
#>     x | Predicted |       95% CI
#> --------------------------------
#> -0.10 |      0.62 | [0.54, 0.69]
#>  0.00 |      0.51 |             
#>  0.10 |      0.40 |
ggpredict(model, c("x", "f")) %>% as_tibble() %>% print(n = 20)
#> Data were 'prettified'. Consider using `terms="x [all]"` to get smooth plots.
#> # A tibble: 9 x 6
#>       x predicted std.error conf.low conf.high group
#>   <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <fct>
#> 1  -0.1     0.617     0.167    0.537     0.691 a    
#> 2  -0.1     0.617     0.124    0.558     0.672 b    
#> 3  -0.1     0.617     0.167    0.537     0.691 c    
#> 4   0       0.507    NA       NA        NA     a    
#> 5   0       0.507    NA       NA        NA     b    
#> 6   0       0.507    NA       NA        NA     c    
#> 7   0.1     0.396    NA       NA        NA     a    
#> 8   0.1     0.396    NA       NA        NA     b    
#> 9   0.1     0.396    NA       NA        NA     c

I'm trying to use ggeffects::ggpredict to make some effects plots for my model. I find that the standard errors and confidence limits are missing for many of the results. I can reproduce the problem with some simulated data. It seems specifically for observations where the standard error puts the predicted probability close to 0 or 1.

I tried to get predictions on the link scale to diagnose if it's a problem with the translation from link to response, but I don't believe this is supported by the package.

Any ideas how to address this? Many thanks.

library(tidyverse)
library(lme4)
library(ggeffects)

# number of simulated observations
n <- 1000

# simulated data with a numerical predictor x, factor predictor f, response y
# the simulated effects of x and f are somewhat weak compared to the noise, so expect high standard errors
df <- tibble(
  x = seq(-0.1, 0.1, length.out = n),
  g = floor(runif(n) * 3),
  f = letters[1 + g] %>% as.factor(),
  y = pracma::sigmoid(x + (runif(n) - 0.5) + 0.1 * (g - mean(g))),
  z = if_else(y > 0.5, "high", "low") %>% as.factor()
)

# glmer model
model <- glmer(z ~ x + (1 | f), data = df, family = binomial)
print(summary(model))
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: binomial  ( logit )
#> Formula: z ~ x + (1 | f)
#>    Data: df
#> 
#>      AIC      BIC   logLik deviance df.resid 
#>   1373.0   1387.8   -683.5   1367.0      997 
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -1.3858 -0.9928  0.7317  0.9534  1.3600 
#> 
#> Random effects:
#>  Groups Name        Variance Std.Dev.
#>  f      (Intercept) 0.0337   0.1836  
#> Number of obs: 1000, groups:  f, 3
#> 
#> Fixed effects:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.02737    0.12380   0.221    0.825    
#> x           -4.48012    1.12066  -3.998 6.39e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>   (Intr)
#> x -0.001

# missing standard errors
ggpredict(model, c("x", "f")) %>% print()
#> Data were 'prettified'. Consider using `terms="x [all]"` to get smooth plots.
#> # Predicted probabilities of z
#> 
#> # f = a
#> 
#>     x | Predicted |       95% CI
#> --------------------------------
#> -0.10 |      0.62 | [0.54, 0.69]
#>  0.00 |      0.51 |             
#>  0.10 |      0.40 |             
#> 
#> # f = b
#> 
#>     x | Predicted |       95% CI
#> --------------------------------
#> -0.10 |      0.62 | [0.56, 0.67]
#>  0.00 |      0.51 |             
#>  0.10 |      0.40 |             
#> 
#> # f = c
#> 
#>     x | Predicted |       95% CI
#> --------------------------------
#> -0.10 |      0.62 | [0.54, 0.69]
#>  0.00 |      0.51 |             
#>  0.10 |      0.40 |
ggpredict(model, c("x", "f")) %>% as_tibble() %>% print(n = 20)
#> Data were 'prettified'. Consider using `terms="x [all]"` to get smooth plots.
#> # A tibble: 9 x 6
#>       x predicted std.error conf.low conf.high group
#>   <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <fct>
#> 1  -0.1     0.617     0.167    0.537     0.691 a    
#> 2  -0.1     0.617     0.124    0.558     0.672 b    
#> 3  -0.1     0.617     0.167    0.537     0.691 c    
#> 4   0       0.507    NA       NA        NA     a    
#> 5   0       0.507    NA       NA        NA     b    
#> 6   0       0.507    NA       NA        NA     c    
#> 7   0.1     0.396    NA       NA        NA     a    
#> 8   0.1     0.396    NA       NA        NA     b    
#> 9   0.1     0.396    NA       NA        NA     c

Created on 2022-04-12 by the reprex package (v2.0.1)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

眼泪淡了忧伤 2025-01-27 17:38:02

我认为这可能是由于单一模型拟合所致。

我深入研究了代码的核心 here ,似乎存在不匹配的地方预测协方差矩阵的维度 (3x3) 和预测值的数量 (15) 之间。

我进一步怀疑问题可能发生在此处

rows_to_keep <- as.numeric(rownames(unique(model_matrix_data[
             intersect(colnames(model_matrix_data), terms)])))

也许该函数变得混乱,因为每个组的条件模式/BLUP 都是相同(一般来说,当随机效应方差为零时,这才是正确的)...?

这似乎值得在 ggeffects 问题列表 上提出一个问题?

I think this may be due to the singular model fit.

I dug down into the guts of the code as far as here, where there appears to be a mismatch between the dimensions of the covariance matrix of the predictions (3x3) and the number of predicted values (15).

I further suspect that the problem may happen here:

rows_to_keep <- as.numeric(rownames(unique(model_matrix_data[
             intersect(colnames(model_matrix_data), terms)])))

Perhaps the function is getting confused because the conditional modes/BLUPs for every group are the same (which will only be true, generically, when the random effects variance is zero) ... ?

This seems worth opening an issue on the ggeffects issues list ?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文