将估算的 2 级数据（小鼠）与非估算的 1 级数据合并，以使用 brms 进行多级分析

发布于 2025-01-09 22:04:39 字数 6459 浏览 6 评论 0原文

我正在使用 R mice 包来估算一些参与者随机缺失的问卷项目值。随后，使用 brms 将问卷的总分用于多级模型中，作为任务（多次试验，第 1 级）中反应时间的预测因子（第 2 级）。

我已经尝试了两种不同的方法来创建一个包含所有数据的 mids 对象，稍后可以在 brms_multiple 中使用，但到目前为止还没有成功：

1.）我将数据帧分开，将项目值估算为问卷数据框，创建一个长格式的数据框，包括原始数据和所有插补（使用 complete 函数），并计算每个参与者在每个插补中的总分（使用 rowSums< /代码>）。之后，我将这个长数据帧与 1 级反应时间数据连接起来（使用 full_join），并尝试将其转换为 mids 对象（as.mids）。然而，鉴于由于加入而出现多次出现 .id，这是不可行的。

2.) 我在插补之前加入了数据框，并尝试通过使用 miceadds 扩展 mice 来仅插补 2 级调查问卷。在这里，我通过预测矩阵仅将项目得分定义为预测变量，2lonly.function 作为方法，正确的插补函数和 ID 作为聚类变量。这导致 edit.setup(data, setup, ...) 中出现错误：`mice` 检测到常量和/或共线变量。删除后没有留下任何预测变量。

有人遇到过类似的问题并且可以解决它们吗？

--- 编辑：这是方法 1 的可重现示例（我的首选）

#So this is a fake dataset for the level 1 data:
  
data1 <- structure(list(participant = structure(1:20, .Label = c("1", 
                                                                 "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                                                                 "14", "15", "16", "17", "18", "19", "20"), class = "factor"), 
                        scale1 = c(20.5176893097081, 17.1907529978866, NA, NA, 23.0900118234823, 
                                   16.825451016666, 17.9720180052918, 28.4363035263208, 26.0191098441877, 
                                   26.1444447937135, NA, 25.091133563164, 10.3353758051478, 
                                   18.0322232007671, 14.1767794585022, 20.9102922916395, 20.6239907650613, 
                                   17.661597152285, 18.3255223659322, 18.9958533053766), 
                        scale2 = c(23.8446274459682, 
                                   NA, 13.3562256053306, 8.52823315494693, 18.3034641524201, 
                                   17.1100738924451, 20.0295218831116, 15.6986473122548, 14.9647149797442, 
                                   32.1875950434602, 25.255823725488, NA, 15.2625337013248, 
                                   17.6354282904461, 5.86783073951034, NA, 16.3987924521716, 
                                   11.3574747700045, 18.3557569542574, 18.741406021827)), 
                   row.names = c(NA, 
                                 -20L), class = "data.frame")


#This is for the level 2 data:

data2 <- structure(list(participant = structure(c(1L, 1L, 1L, 1L, 1L, 
                                                  1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
                                                  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
                                                  4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
                                                  6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                                                  7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 
                                                  9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
                                                  10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 
                                                  12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 
                                                  13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 
                                                  14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 
                                                  16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 
                                                  17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 
                                                  18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 
                                                  20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L), 
                                                .Label = c("1", 
                                                           "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                                                           "14", "15", "16", "17", "18", "19", "20"), class = "factor"), 
                        RT = c(416, 389, 383, 411, 354, 404, 354, 433, 411, 408, 
                               339, 368, 474, 407, 411, 366, 401, 427, 415, 376, 398, 393, 
                               391, 483, 466, 427, 372, 380, 360, 383, 374, 412, 412, 394, 
                               403, 387, 427, 383, 362, 402, 397, 445, 393, 407, 450, 381, 
                               395, 428, 423, 423, 435, 404, 405, 426, 392, 408, 383, 371, 
                               409, 422, 386, 412, 420, 353, 429, 350, 395, 428, 428, 437, 
                               423, 475, 444, 369, 360, 429, 365, 379, 391, 446, 405, 360, 
                               354, 399, 428, 403, 432, 392, 394, 448, 474, 411, 398, 373, 
                               415, 333, 401, 395, 403, 429, 344, 426, 391, 394, 456, 371, 
                               339, 409, 373, 389, 384, 408, 436, 359, 394, 440, 415, 418, 
                               401, 379, 330, 452, 388, 388, 315, 389, 399, 403, 344, 441, 
                               404, 409, 357, 369, 385, 385, 452, 370, 436, 371, 403, 459, 
                               466, 408, 451, 393, 355, 362, 418, 440, 360, 377, 400, 390, 
                               369, 414, 390, 368, 381, 387, 386, 415, 387, 374, 442, 405, 
                               441, 395, 420, 431, 435, 438, 420, 412, 391, 408, 409, 413, 
                               371, 447, 392, 385, 421, 377, 419, 437, 401, 392, 431, 491, 
                               412, 399, 446, 408, 369, 387, 372, 428, 389, 401)), 
                   row.names = c(NA, 
                                 -200L), class = "data.frame")



# run imputation on level 1 data
imputed <- mice(data1)

#create dataframe with all imputation + sum scores of scales (each participant)
data1_imputed <- complete(imputed, action = "long", include = TRUE)
data1_imputed$sumscore <- rowSums(data1_imputed[c("scale1", "scale2")])

# merge imputed level 1 data with level 2 data
data_all <- dplyr::full_join(data1_imputed, data2)

# try to create mids object with merged data - NOT WORKING
merged_imputed <- as.mids(data_all)```

原文

I'm using the R mice package to impute random missing questionnaire item values for a few participants. The sum score of the questionnaire is later used in a multilevel model as predictor (level 2) of reaction times in a task (multiple trials, level 1), using brms.

I already tried two different approaches to create a mids object which includes all data and can later be used in brms_multiple but none worked so far:

1.) I kept the data frames separate, imputed the item values in the questionnaire data frame, created a data frame in long format including the original data and all imputations (using the complete function) and calculated the sum scores for each participant in each imputation (using rowSums). Afterwards, I joined this long data frame with the level-1 reaction time data (using full_join) and tried to convert it in a mids object (as.mids). This was, however, not feasible given the multiple occurrences of .id which emerged due to the joining.

2.) I joined the data frames before imputation and tried to impute only the level-2 questionnaire by extending mice with miceadds. Here, I defined only the item scores as predictors via the predictor matrix, 2lonly.function as method,the correct imputation function and ID as cluster variable. This resulted in Error in edit.setup(data, setup, ...) : `mice` detected constant and/or collinear variables. No predictors were left after their removal.

Did anyone experience similar issues and could solve them?

--- edit: here is a reproducible example for method 1 (my preferred one)

#So this is a fake dataset for the level 1 data:
  
data1 <- structure(list(participant = structure(1:20, .Label = c("1", 
                                                                 "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                                                                 "14", "15", "16", "17", "18", "19", "20"), class = "factor"), 
                        scale1 = c(20.5176893097081, 17.1907529978866, NA, NA, 23.0900118234823, 
                                   16.825451016666, 17.9720180052918, 28.4363035263208, 26.0191098441877, 
                                   26.1444447937135, NA, 25.091133563164, 10.3353758051478, 
                                   18.0322232007671, 14.1767794585022, 20.9102922916395, 20.6239907650613, 
                                   17.661597152285, 18.3255223659322, 18.9958533053766), 
                        scale2 = c(23.8446274459682, 
                                   NA, 13.3562256053306, 8.52823315494693, 18.3034641524201, 
                                   17.1100738924451, 20.0295218831116, 15.6986473122548, 14.9647149797442, 
                                   32.1875950434602, 25.255823725488, NA, 15.2625337013248, 
                                   17.6354282904461, 5.86783073951034, NA, 16.3987924521716, 
                                   11.3574747700045, 18.3557569542574, 18.741406021827)), 
                   row.names = c(NA, 
                                 -20L), class = "data.frame")


#This is for the level 2 data:

data2 <- structure(list(participant = structure(c(1L, 1L, 1L, 1L, 1L, 
                                                  1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
                                                  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
                                                  4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
                                                  6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                                                  7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 
                                                  9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
                                                  10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 
                                                  12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 
                                                  13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 
                                                  14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 
                                                  16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 
                                                  17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 
                                                  18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 
                                                  20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L), 
                                                .Label = c("1", 
                                                           "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                                                           "14", "15", "16", "17", "18", "19", "20"), class = "factor"), 
                        RT = c(416, 389, 383, 411, 354, 404, 354, 433, 411, 408, 
                               339, 368, 474, 407, 411, 366, 401, 427, 415, 376, 398, 393, 
                               391, 483, 466, 427, 372, 380, 360, 383, 374, 412, 412, 394, 
                               403, 387, 427, 383, 362, 402, 397, 445, 393, 407, 450, 381, 
                               395, 428, 423, 423, 435, 404, 405, 426, 392, 408, 383, 371, 
                               409, 422, 386, 412, 420, 353, 429, 350, 395, 428, 428, 437, 
                               423, 475, 444, 369, 360, 429, 365, 379, 391, 446, 405, 360, 
                               354, 399, 428, 403, 432, 392, 394, 448, 474, 411, 398, 373, 
                               415, 333, 401, 395, 403, 429, 344, 426, 391, 394, 456, 371, 
                               339, 409, 373, 389, 384, 408, 436, 359, 394, 440, 415, 418, 
                               401, 379, 330, 452, 388, 388, 315, 389, 399, 403, 344, 441, 
                               404, 409, 357, 369, 385, 385, 452, 370, 436, 371, 403, 459, 
                               466, 408, 451, 393, 355, 362, 418, 440, 360, 377, 400, 390, 
                               369, 414, 390, 368, 381, 387, 386, 415, 387, 374, 442, 405, 
                               441, 395, 420, 431, 435, 438, 420, 412, 391, 408, 409, 413, 
                               371, 447, 392, 385, 421, 377, 419, 437, 401, 392, 431, 491, 
                               412, 399, 446, 408, 369, 387, 372, 428, 389, 401)), 
                   row.names = c(NA, 
                                 -200L), class = "data.frame")



# run imputation on level 1 data
imputed <- mice(data1)

#create dataframe with all imputation + sum scores of scales (each participant)
data1_imputed <- complete(imputed, action = "long", include = TRUE)
data1_imputed$sumscore <- rowSums(data1_imputed[c("scale1", "scale2")])

# merge imputed level 1 data with level 2 data
data_all <- dplyr::full_join(data1_imputed, data2)

# try to create mids object with merged data - NOT WORKING
merged_imputed <- as.mids(data_all)```

分享到QQ

分享到微博