调查加权和多输入数据的汇总平均边际效应

发布于 2025-02-10 11:18:24 字数 1436 浏览 1 评论 0原文

除了我使用小鼠（）估算的数据之外，我还处理调查数据及其相关权重。我最终正在运行的模型包含变量之间的复杂交互，我想要平均边缘效应。

这项任务在Stata中似乎很微不足道，但是我宁愿留在R中，因为这是我最了解的。对于每个单独的估算数据集和平均估计值，检索AME似乎很容易。但是，我需要使用池（）（来自鼠标）来确保我遇到正确的标准错误。

这是一个可重复的示例：

library(tidyverse)
library(survey)
library(mice)
library(margins)

df <- tibble(y = c(0, 5, 0, 4, 0, 1, 2, 3, 1, 12), region = c(1, 1, 1, 1, 1, 3, 3, 3, 3, 3), 
             weight = c(7213, 2142, 1331, 4342, 9843, 1231, 1235, 2131, 7548, 2348), 
             x1 = c(1.14, 2.42, -0.34, 0.12, -0.9, -1.2, 0.67, 1.24, 0.25, -0.3),
             x2 = c(12, NA, 10, NA, NA, 12, 11, 8, 9, 9))

在简单的（非数字）svyglm上使用margins（）无需挂接。使用（）和汇总结果在每个插图上运行svyGLM也可以很好地工作。

m <- with(surv_obj, svyglm(y ~ x1 * x2))
pool(m)

但是，包装Margins（）（）（）返回一个错误“ .svyCheck（Design）中的错误（设计）：参数“设计”，如果我在svyglm呼叫中指定设计，则没有默认的“ design”

with(surv_obj, margins(svyglm(y ~ x1 * x2), design = surv_obj))

，我会在usemethod中获得“错误”（ “ svyglm”，设计）：没有适用的方法用于“ svyglm'”的对象“ svyimputationlist”，“

with(surv_obj, margins(svyglm(y ~ x1 * x2, design = surv_obj), design = surv_obj))

如果我放下调查层，只需尝试在每个估算的集合上运行边距，然后才会收到警告，我会得到警告：“在get.dfcom（对象，DFCOM）中警告：假定的无限样本量。”。

m1 <- with(imputed_df, margins(lm(y ~ x1 * x2)))
pool(m1)

考虑到池（）可能在其计算中使用样本量，这让我感到担忧。

任何人都知道（a）使用的任何方法（），margins（）和pool（）来检索汇总的平均边缘效应，或pool.scalar（））以达到所需的结果？

原文

I am working with survey data and their associated weights, in addition to missing data that I imputed using mice(). The model I'm eventually running contains complex interactions between variables for which I want the average marginal effect.

This task seems trivial in STATA, but I'd rather stay in R since that's what I know best. It seems easy to retrieve AME's for each separate imputed dataset and average the estimates. However, I need to make use of pool() (from mice) to make sure I'm getting the correct standard errors.

Here is a reproducible example:

library(tidyverse)
library(survey)
library(mice)
library(margins)

df <- tibble(y = c(0, 5, 0, 4, 0, 1, 2, 3, 1, 12), region = c(1, 1, 1, 1, 1, 3, 3, 3, 3, 3), 
             weight = c(7213, 2142, 1331, 4342, 9843, 1231, 1235, 2131, 7548, 2348), 
             x1 = c(1.14, 2.42, -0.34, 0.12, -0.9, -1.2, 0.67, 1.24, 0.25, -0.3),
             x2 = c(12, NA, 10, NA, NA, 12, 11, 8, 9, 9))

Using margins() on a simple (non-multiple) svyglm works without a hitch. Running svyglm on each imputation using which() and pooling the results also works well.

m <- with(surv_obj, svyglm(y ~ x1 * x2))
pool(m)

However, wrapping margins() into which() returns an error "Error in .svycheck(design) : argument "design" is missing, with no default"

with(surv_obj, margins(svyglm(y ~ x1 * x2), design = surv_obj))

If I specify the design in the svyglm call, I get "Error in UseMethod("svyglm", design) : no applicable method for 'svyglm' applied to an object of class "svyimputationList""

with(surv_obj, margins(svyglm(y ~ x1 * x2, design = surv_obj), design = surv_obj))

If I drop the survey layer, and simply try to run the margins on each imputed set and then pool, I get a warning: "Warning in get.dfcom(object, dfcom) : Infinite sample size assumed.".

m1 <- with(imputed_df, margins(lm(y ~ x1 * x2)))
pool(m1)

This worries me given that pool() may use sample size in its calculations.

Does anyone know of any method to either (a) use which(), margins() and pool() to retrieve the pooled average marginal effects or (b) knows what elements of margins() I should pass to pool() (or pool.scalar()) to achieve the desired result?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

逆夏时光 2025-02-17 11:18:24

在文森特（Vincent）的评论中，更新后，

想在文森特（Vincent）的评论和相关软件包marginaleffects（）之后更新此帖子，该评论最终解决了我的问题。希望这将对其他人遇到类似问题的人有帮助。

我在Vincent评论中链接的小插图中实现了代码，并添加了一些步骤，以进行调查加权和建模。值得注意的是，svydesign（）将丢弃在聚类/加权变量上缺少的任何观察值，因此marginaleffects（）无法将值预测到原始的“ dat”数据并会丢弃错误。汇总我的实际数据仍然会引发“假定的无限样本量”，该数据（如前所述）应该很好，但我仍在研究修复程序。

library(tidyverse)
library(survey)
library(mice)
library(marginaleffects)

fit_reg <- function(dat) {
  
    svy <- svydesign(ids = ~ 1, cluster = ~ region, weight = ~weight, data = dat)
    mod <- svyglm(y ~ x1 + x2*factor(x3), design = svy)
    out <- marginaleffects(mod, newdata = dat)
    
    class(out) <- c("custom", class(out))
    return(out)
}

tidy.custom <- function(x, ...) {
    out <- marginaleffects:::tidy.marginaleffects(x, ...)
    out$term <- paste(out$term, out$contrast)
    return(out)
}

df <- tibble(y = c(0, 5, 0, 4, 0, 1, 2, 3, 1, 12), region = c(1, 1, 1, 1, 1, 3, 3, 3, 3, 3), 
             weight = c(7213, 2142, 1331, 4342, 9843, 1231, 1235, 2131, 7548, 2348), 
             x1 = c(1.14, 2.42, -0.34, 0.12, -0.9, -1.2, 0.67, 1.24, 0.25, -0.3),
             x2 = c(12, NA, 10, NA, NA, 12, 11, 8, 9, 9),
             x3 = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2))

imputed_df <- mice(df, m = 2, seed = 123)

dat_mice <- complete(imputed_df, "all")
mod_imputation <- lapply(dat_mice, fit_reg)
mod_imputation <- pool(mod_imputation)

summary(mod_imputation)

Update following Vincent's comment

Wanted to update this post following Vincent's comment and related package marginaleffects() which ended up fixing my issue. Hopefully, this will be helpful to others stuck on similar problems.

I implemented the code in the vignette linked in Vincent's comment, adding a few steps that allow for survey weighting and modeling. It's worth noting that svydesign() will drop any observations missing on clustering/weighting variables, so marginaleffects() can't predict values back unto the original "dat" data and will throw up an error. Pooling my actual data still throws up an "infinite sample size assumed", which (as noted) should be fine but I'm still looking into fixes.

library(tidyverse)
library(survey)
library(mice)
library(marginaleffects)

fit_reg <- function(dat) {
  
    svy <- svydesign(ids = ~ 1, cluster = ~ region, weight = ~weight, data = dat)
    mod <- svyglm(y ~ x1 + x2*factor(x3), design = svy)
    out <- marginaleffects(mod, newdata = dat)
    
    class(out) <- c("custom", class(out))
    return(out)
}

tidy.custom <- function(x, ...) {
    out <- marginaleffects:::tidy.marginaleffects(x, ...)
    out$term <- paste(out$term, out$contrast)
    return(out)
}

df <- tibble(y = c(0, 5, 0, 4, 0, 1, 2, 3, 1, 12), region = c(1, 1, 1, 1, 1, 3, 3, 3, 3, 3), 
             weight = c(7213, 2142, 1331, 4342, 9843, 1231, 1235, 2131, 7548, 2348), 
             x1 = c(1.14, 2.42, -0.34, 0.12, -0.9, -1.2, 0.67, 1.24, 0.25, -0.3),
             x2 = c(12, NA, 10, NA, NA, 12, 11, 8, 9, 9),
             x3 = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2))

imputed_df <- mice(df, m = 2, seed = 123)

dat_mice <- complete(imputed_df, "all")
mod_imputation <- lapply(dat_mice, fit_reg)
mod_imputation <- pool(mod_imputation)

summary(mod_imputation)

回复收藏 0 原文

~没有更多了~