R：混合模型 - 如何使用同一变量的先前值来预测变量

发布于 2025-01-15 08:43:46 字数 1714 浏览 2 评论 0原文

我与多级模型作斗争，并准备了一个可重复的示例来清楚地说明这一点。

假设我想使用之前获得的身高值以及他们之前的体重值（使用这样的数据框）来预测 follow_up 12 个月后儿童的身高，即他们在 Month == 12 时的身高。

df <- data.frame (ID = c (1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3),
                  month = c (1, 3, 6, 12, 1, 6, 12, 1, 6, 8, 12),
                  weight = c (14, 15, 17, 18, 21, 21, 22, 8, 8, 9, 10),
                  height = c (100, 102, 103, 104, 122, 123, 125, 82, 86, 88, 90))
        
   ID month weight height
1   1     1     14    100
2   1     3     15    102
3   1     6     17    103
4   1    12     18    104
5   2     1     21    122
6   2     6     21    123
7   2    12     22    125
8   3     1      8     82
9   3     6      8     86
10  3     8      9     88
11  3    12     10     90

我的计划是使用以下模型（显然我的数据比 3 个患者多得多，每个患者的行数也更多）。因为我的身高在每个患者中都是相关的，所以我想添加一个随机截距 (1|ID)，还有一个随机斜率，这就是我添加 (月|ID) 的原因（我在几个预测分数的示例中看到学生认为“场合”或“日间测试”是作为随机斜率添加的）。所以我使用了以下代码。

library(tidymodels)
library(multilevelmod)
library(lme4)

#Specifications
mixed_model_spec <- linear_reg() %>% 
  set_engine("lmer") %>% 
  set_args(na.action=na.exclude, control = lmerControl(optimizer ="bobyqa"))

#Fitting the model
mixed_model_fit <- 
  mixed_model_spec %>% 
  fit(height ~ weight + month + (month|ID),
      data = df)

我的第一个问题是，如果我添加“权重”（及其每个 ID 的多个值）作为变量，则会出现以下错误“边界（奇异）拟合：请参阅帮助（'isSingular'）”（即使在我的大型数据集中也是如此），而如果我只保留每个患者具有一个值的变量（例如性别），我就不会遇到这个问题。谁能解释我为什么？

我的第二个问题是，通过训练类似的模型，我可以预测新孩子几乎所有月份的身高值（我在第 1 个月、X 月、...、12 月得到预测值），我可以将其与在我的测试集上收集的真实值。然而，我感兴趣的是预测第 12 个月的值，并在此测试中整合每个患者之前的值。换句话说，我不希望模型从头开始预测整组值（更准确地说，根据用于训练的患者数据），而且还根据新患者在第 1 个月、第 4 个月、第 1 个月、第 4 个月的先前值来预测整组值。 6等已经可用。我如何编写代码来获得这样的预测？

非常感谢您的帮助！

原文

I struggle with multilevel models and prepared a reproducible example to be clear.

Let's say I would like to predict the height of children after 12 months of follow_up, i.e. their height at month == 12, using the previous values obtained for the height, but also their previous values of weight, with such a dataframe.

df <- data.frame (ID = c (1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3),
                  month = c (1, 3, 6, 12, 1, 6, 12, 1, 6, 8, 12),
                  weight = c (14, 15, 17, 18, 21, 21, 22, 8, 8, 9, 10),
                  height = c (100, 102, 103, 104, 122, 123, 125, 82, 86, 88, 90))
        
   ID month weight height
1   1     1     14    100
2   1     3     15    102
3   1     6     17    103
4   1    12     18    104
5   2     1     21    122
6   2     6     21    123
7   2    12     22    125
8   3     1      8     82
9   3     6      8     86
10  3     8      9     88
11  3    12     10     90

My plan was to use the following model (obviously I have much more data than 3 patients, and more lines per patient). Because my height are correlated within each patient, I wanted to add a random intercept (1|ID), but also a random slope and it is the reason why I added (month|ID) (I saw in several examples of predicting scores of students that the "occasion" or "day test" was added as a random slope). So I used the following code.

library(tidymodels)
library(multilevelmod)
library(lme4)

#Specifications
mixed_model_spec <- linear_reg() %>% 
  set_engine("lmer") %>% 
  set_args(na.action=na.exclude, control = lmerControl(optimizer ="bobyqa"))

#Fitting the model
mixed_model_fit <- 
  mixed_model_spec %>% 
  fit(height ~ weight + month + (month|ID),
      data = df)

My first problem is that if I add "weight" (and its multiple values per ID) as a variable, I have the following error "boundary (singular) fit: see help('isSingular')" (even in my large dataset), while if I keep only variables with one value per patient (e.g. sex) I do not have this problem.
Can anyone explain me why ?

My second problem is that by training a similar model, I can predict for new children the values of height at nearly all months (I get a predicted value at month 1, month X, ..., month 12) that I can compare to the real values collected on my test set.
However, what I am interesting in is to predict the value at month 12 and integrate the previous values from each patients in this testing test. In other words, I do not want the model to predict the whole set of values from scratch (more precisely, from the patient data used for training), but also from the previous values of the new patient at month 1, month 4, month 6 etc. already available. How I can write my code to obtain such a prediction?

Thanks a lot for your help!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

┼── 2025-01-22 08:43:46

我的第一个问题是，如果我将“weight”（及其每个 ID 的多个值）添加为变量，则会出现以下错误“boundary (singular) fit: see help('isSingular')”（即使在我的大数据集），而如果我只保留每个患者具有一个值的变量（例如性别），我就不会遇到这个问题。谁能解释一下为什么？

当随机效应结构过于复杂而无法得到数据支持时，就会发生这种情况。除此之外，通常不可能确切地确定为什么在某些情况下会发生这种情况，而在其他情况下则不会。基本上模型已经过拟合了。您可以尝试的一些操作是：