在 R 中预测/估算泊松 GLM 回归的缺失值?

发布于 2024-11-27 11:54:07 字数 2194 浏览 2 评论 0原文

我正在尝试探索在数据集中填补缺失值的方法。我的数据集包含年份(2001-2009)、月份(1-12)、性别(男/女)和年龄组(4 组)的发生次数(非自然、自然和总和)。

我正在探索的插补技术之一是(泊松)回归插补。

假设我的数据如下所示:

    Year Month Gender AgeGroup Unnatural Natural Total
569 2006     5   Male     15up       278     820  1098
570 2006     6   Male     15up       273     851  1124
571 2006     7   Male     15up       304     933  1237
572 2006     8   Male     15up       296    1064  1360
573 2006     9   Male     15up       298     899  1197
574 2006    10   Male     15up       271     819  1090
575 2006    11   Male     15up       251     764  1015
576 2006    12   Male     15up       345     792  1137
577 2007     1 Female        0        NA      NA    NA
578 2007     2 Female        0        NA      NA    NA
579 2007     3 Female        0        NA      NA    NA
580 2007     4 Female        0        NA      NA    NA
581 2007     5 Female        0        NA      NA    NA
...

在进行基本的 GLM 回归之后 - 96 个观测值因缺失而被删除。

R 中是否有一种方法/包/函数将使用此 GLM 模型的系数来“预测”(即估算)Total 的缺失值(即使它只是将其存储在单独的数据框中 - 我将使用 Excel合并它们)?我知道我可以使用系数来预测不同的层次结构行 - 但这将需要很长时间。希望有一种单步函数/方法吗?

Call:
glm(formula = Total ~ Year + Month + Gender + AgeGroup, family = poisson)

Deviance Residuals: 
      Min         1Q     Median         3Q        Max  
-13.85467   -1.13541   -0.04279    1.07133   10.33728  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)   13.3433865  1.7541626   7.607 2.81e-14 ***
Year          -0.0047630  0.0008750  -5.443 5.23e-08 ***
Month          0.0134598  0.0006671  20.178  < 2e-16 ***
GenderMale     0.2265806  0.0046320  48.916  < 2e-16 ***
AgeGroup01-4  -1.4608048  0.0224708 -65.009  < 2e-16 ***
AgeGroup05-14 -1.7247276  0.0250743 -68.785  < 2e-16 ***
AgeGroup15up   2.8062812  0.0100424 279.444  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 403283.7  on 767  degrees of freedom
Residual deviance:   4588.5  on 761  degrees of freedom
  (96 observations deleted due to missingness)
AIC: 8986.8

Number of Fisher Scoring iterations: 4

I'm trying to explore ways of imputing missing values in a data set. My dataset contains the number of counts of an occurance (Unnatural, Natural and the sum Total) for Year(2001-2009), Month(1-12), Gender(M/F) and AgeGroup(4 groups).

One of the imputation techniques I'm exploring is (poisson) regression imputation.

Say my data looks like this:

    Year Month Gender AgeGroup Unnatural Natural Total
569 2006     5   Male     15up       278     820  1098
570 2006     6   Male     15up       273     851  1124
571 2006     7   Male     15up       304     933  1237
572 2006     8   Male     15up       296    1064  1360
573 2006     9   Male     15up       298     899  1197
574 2006    10   Male     15up       271     819  1090
575 2006    11   Male     15up       251     764  1015
576 2006    12   Male     15up       345     792  1137
577 2007     1 Female        0        NA      NA    NA
578 2007     2 Female        0        NA      NA    NA
579 2007     3 Female        0        NA      NA    NA
580 2007     4 Female        0        NA      NA    NA
581 2007     5 Female        0        NA      NA    NA
...

After doing a basic GLM regression - 96 observations have been deleted due to them being missing.

Is there perhaps a way/package/function in R which will use the coefficients of this GLM model to 'predict' (ie. impute) the missing values for Total (even if it just stores it in a separate dataframe - I will use Excel to merge them)? I know I can use the coefficients to predict the different hierarchal rows - but this will take forever. Hopefully there's an one step function/method?

Call:
glm(formula = Total ~ Year + Month + Gender + AgeGroup, family = poisson)

Deviance Residuals: 
      Min         1Q     Median         3Q        Max  
-13.85467   -1.13541   -0.04279    1.07133   10.33728  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)   13.3433865  1.7541626   7.607 2.81e-14 ***
Year          -0.0047630  0.0008750  -5.443 5.23e-08 ***
Month          0.0134598  0.0006671  20.178  < 2e-16 ***
GenderMale     0.2265806  0.0046320  48.916  < 2e-16 ***
AgeGroup01-4  -1.4608048  0.0224708 -65.009  < 2e-16 ***
AgeGroup05-14 -1.7247276  0.0250743 -68.785  < 2e-16 ***
AgeGroup15up   2.8062812  0.0100424 279.444  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 403283.7  on 767  degrees of freedom
Residual deviance:   4588.5  on 761  degrees of freedom
  (96 observations deleted due to missingness)
AIC: 8986.8

Number of Fisher Scoring iterations: 4

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

何时共饮酒 2024-12-04 11:54:07

首先,要非常小心随机丢失的假设。你的例子看起来失踪与女性和年龄组同时发生。您应该真正测试缺失是否与任何预测变量相关(或者是否有任何预测变量缺失)。如果是这样,回应可能会出现偏差。

其次,您正在寻找的函数可能是 predict,它可以采用 glm 模型。请参阅 ?predict.glm 了解更多指导。您可能需要拟合级联模型(即嵌套模型)来解决缺失值。

First, be very careful about the assumption of missing at random. Your example looks like missingness co-occurs with Female and agegroup. You should really test whether missingness is related to any predictors (or whether any predictors are missing). If so, the responses could be skewed.

Second, the function you are seeking is likely to be predict, which can take a glm model. See ?predict.glm for more guidance. You may want to fit a cascade of models (i.e. nested models) to address missing values.

小鸟爱天空丶 2024-12-04 11:54:07

mouse 包提供了一个同名函数,允许使用基于其他值的回归方案来预测每个缺失值。它可以应对预测变量缺失的情况,因为它使用迭代 MCMC 算法。

我不认为泊松回归是一种选择,但如果所有计数都与示例一样大,则正常回归应该提供合理的近似值。

The mice package provides a function of the same name that allows each missing value to be predicted using a regression scheme based on the other values. It can cope with predictors also being missing because it uses an iterative MCMC algorithm.

I don't think poisson regression is an option, but if all of your counts are as large as the example normal regression should offer a reasonable approximation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文