nlme 错误“组的公式无效”尽管指定了随机效应

发布于 2024-12-23 09:45:44 字数 4440 浏览 2 评论 0原文

我对此进行了一些搜索，但我发现的邮件列表帖子与未在 nlme 中指定随机效果的人相关，而我已经这样做了。我还拥有 Pinheiro 和 Bates 撰写的《S 和 S-Plus 中的混合效应模型》一书，但无法从书中解决我的问题。

我仍在进行营养数据分析，现在已转向实际数据。这些数据来自一项人口调查，并采用重复测量设计，因为每个受访者都有两次 24 小时内对该营养素的摄入量回忆。

我已经成功地将 lme4 模型拟合到我的数据中，现在我试图找出如果我使用非线性方法会发生什么。我的数据快照如下：

head(Male.Data)
   RespondentID Age SampleWeight  IntakeDay IntakeAmt AgeFactor BoxCoxXY
2        100020  12    0.4952835 Day1Intake 12145.852     9to13 15.61196
7        100419  14    0.3632839 Day1Intake  9591.953    14to18 15.01444
8        100459  11    0.4952835 Day1Intake  7838.713     9to13 14.51458
12       101138  15    1.3258785 Day1Intake 11113.266    14to18 15.38541
14       101214   6    2.1198688 Day1Intake  7150.133      4to8 14.29022
18       101389   5    2.1198688 Day1Intake  5091.528      4to8 13.47928

有关数据的摘要信息为：

str(Male.Data)
'data.frame':   4498 obs. of  7 variables:
$ RespondentID: Factor w/ 4487 levels "100013","100020",..: 2 7 8 12 14 18 19 20 21 22 ...
$ Age         : int  12 14 11 15 6 5 10 2 2 9 ...
$ SampleWeight: num  0.495 0.363 0.495 1.326 2.12 ...
$ IntakeDay   : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
$ IntakeAmt   : num  12146 9592 7839 11113 7150 ...
$ AgeFactor   : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
$ BoxCoxXY    : num  15.6 15 14.5 15.4 14.3 ...

使用lme4包，我成功地拟合了线性混合效应模型（随机效应来自受试者和>IntakeDay 是与 BoxCoxXY 相关的重复测量因子，它是 IntakeAmt 的变换）：

Male.lme1 <- lmer(BoxCoxXY ~ AgeFactor + IntakeDay + (1|RespondentID),
        data = Male.Data, 
        weights = SampleWeight)

我一直在尝试使用 nlme code> 包来查看拟合非线性模型比较两者，但我无法让我的语法工作。我最初的问题是，我的数据似乎没有相关的 SelfStart 模型，因此我使用 geeglm 生成起始值（系数保存到名为 Male.nlme.start 的数据框中））。但现在我收到错误：

Error in getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]),  : 
Invalid formula for groups

我无法弄清楚我做错了什么，我使用的 nlme 语法是：

Male.nlme1 <- nlme(BoxCoxXY ~ AgeFactor + IntakeDay + RespondentID , data = Male.Data, 
fixed = AgeFactor + IntakeDay ~ 1, 
random = RespondentID ~ 1,
start=c(Male.nlme.start[,"Estimate"]))

我已经尝试过使用和不使用 RespondentID 的分析code> 包含在整体模型规范中，这似乎没有影响。

我尝试坚持使用非线性方法的原因是 SAS 中的原始分析使用了非线性方法。虽然我的残差等从 lme 分析中看起来不错，但我很好奇非线性方法会产生什么影响。

如果有帮助的话，上次分析尝试的 traceback() 结果（其中包括 RespondentID）是：

5: stop("Invalid formula for groups")
4: getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]), 
   sep = "|"))))
3: getGroups(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]), 
   sep = "|"))))
2: nlme.formula(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data, 
   fixed = AgeFactor + IntakeDay ~ 1, random = RespondentID ~ 
       1, start = c(Male.nlme.start[, "Estimate"]))
1: nlme(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data, fixed = AgeFactor + 
   IntakeDay ~ 1, random = RespondentID ~ 1, start = c(Male.nlme.start[, 
   "Estimate"]))

任何人都可以建议我哪里出了问题吗？我开始怀疑是否 (1) RespondentID 的因素级别太多，无法在 nlme 中工作，或者 (2) 该方法仅在我提供RespondentID 的启动参数，这对于我拥有的数据来说似乎毫无意义，因为这是我的主题标识符。

更新：为了回答 Ben，SAS nlmixed 模型是固定效应的通用对数似然函数：

ll1 <- log(1/sqrt(2*pi*Scale))
ll2 <- as.data.frame(-(BoxCoxXY - Intercept + AgeFactor + IntakeDay + u2)^2)/(2*Scale)+(Lambda.Value-1)*log(IntakeAmt)
ll <- ll1 + ll2
model IntakeAmt ~ general(ll)

其中：

Scale = 来自 geeglm 的离散值

Lambda.Value = 与早期 boxcox() 输出的最大对数似然相关的 lambda 值，该值用于将 IntakeAmt 转换为 BoxCoxXY 通过公式Male.Data$BoxCoxXY <- (Male.Data$IntakeAmt^Lambda.Value-1)/Lambda.Value

SAS代码中的random语句是：

random u1 u2 ~ normal([0,0][&vu1,COV_U1U2,&vu2]) subject=RespondentID

所以有是模型中的两个误差项，它们都作为随机效应进行拟合。第二个方括号表示按行顺序列出的随机效应方差矩阵的下三角，并使用 SAS 语法中的 SAS 宏变量指定。

我得到的模型摘要是正常的一行概述，显示协变量矩阵 (BX) 加上误差分量，因此这里没有太多帮助。

第二次更新：我意识到我没有删除与女性受试者相关的 RespondentID 级别，因为在按性别将 RespondentID 分解为单独的数据框进行分析之前，我在整个数据框中分解了 RespondentID。在删除 RespondentID 的未使用因子水平后，我重复了 nlme 分析，但出现了相同的错误。 lmer 结果是相同的 - 很高兴知道这一点。 :)

原文

I have done some searching for this, but the mailing list posts I have found are associated with the person not specifying a random effect in nlme whereas I have done this. I also own the book Mixed Effect Models in S and S-Plus by Pinheiro and Bates, but can't work out my problem from the book.

I'm still working on my nutrient data analysis, and have now shifted onto real data. The data come from a population survey, and feature a repeated measures design as each respondent has two 24-hour intake recalls for the nutrient.

I have successfully fit a lme4 model to my data, and now I am trying to find out what happens if I use a nonlinear method instead. A snapshot of my data is below:

head(Male.Data)
   RespondentID Age SampleWeight  IntakeDay IntakeAmt AgeFactor BoxCoxXY
2        100020  12    0.4952835 Day1Intake 12145.852     9to13 15.61196
7        100419  14    0.3632839 Day1Intake  9591.953    14to18 15.01444
8        100459  11    0.4952835 Day1Intake  7838.713     9to13 14.51458
12       101138  15    1.3258785 Day1Intake 11113.266    14to18 15.38541
14       101214   6    2.1198688 Day1Intake  7150.133      4to8 14.29022
18       101389   5    2.1198688 Day1Intake  5091.528      4to8 13.47928

And the summary information about the data is:

str(Male.Data)
'data.frame':   4498 obs. of  7 variables:
$ RespondentID: Factor w/ 4487 levels "100013","100020",..: 2 7 8 12 14 18 19 20 21 22 ...
$ Age         : int  12 14 11 15 6 5 10 2 2 9 ...
$ SampleWeight: num  0.495 0.363 0.495 1.326 2.12 ...
$ IntakeDay   : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
$ IntakeAmt   : num  12146 9592 7839 11113 7150 ...
$ AgeFactor   : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
$ BoxCoxXY    : num  15.6 15 14.5 15.4 14.3 ...

Using the lme4 package, I have successfully fit a linear mixed effects model using (the random effect is from the subjects and IntakeDay is the repeated measure factor associated with BoxCoxXY, which is a transform of IntakeAmt):

Male.lme1 <- lmer(BoxCoxXY ~ AgeFactor + IntakeDay + (1|RespondentID),
        data = Male.Data, 
        weights = SampleWeight)

I have been trying to use the nlme package to look at fitting a nonlinear model to compare the two, but I cannot get my syntax to work. My initial problem was that there does not seem to be a relevant SelfStart model for my data, so I used geeglm to generate starting values (coefficients saved to a data frame called Male.nlme.start). But now I just get the error:

Error in getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]),  : 
Invalid formula for groups

I can't work out what I am doing wrong, the nlme syntax I am using is:

Male.nlme1 <- nlme(BoxCoxXY ~ AgeFactor + IntakeDay + RespondentID , data = Male.Data, 
fixed = AgeFactor + IntakeDay ~ 1, 
random = RespondentID ~ 1,
start=c(Male.nlme.start[,"Estimate"]))

I have tried the analysis both with and without RespondentID being included in the overall model specification, and this seems to have no impact.

The reason I am trying to persevere with the nonlinear method is that the original analysis in SAS used a nonlinear approach. While my residuals etc look acceptably good from the lme analysis, I am curious to see what impact a nonlinear approach would have.

In case it is helpful, the traceback() results from the last analysis attempt, which includes RespondentID is:

5: stop("Invalid formula for groups")
4: getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]), 
   sep = "|"))))
3: getGroups(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]), 
   sep = "|"))))
2: nlme.formula(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data, 
   fixed = AgeFactor + IntakeDay ~ 1, random = RespondentID ~ 
       1, start = c(Male.nlme.start[, "Estimate"]))
1: nlme(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data, fixed = AgeFactor + 
   IntakeDay ~ 1, random = RespondentID ~ 1, start = c(Male.nlme.start[, 
   "Estimate"]))

Can anyone suggest where I have gone wrong? I'm starting to wonder if either (1) there are too many factor levels for RespondentID to work in nlme or (2) the method will only work if I supply a start parameter for RespondentID, which seems nonsensical with the data I have as this is my subject identifier.

Update: to answer Ben, the SAS nlmixed model is a general log likelihood function for the fixed effects:

ll1 <- log(1/sqrt(2*pi*Scale))
ll2 <- as.data.frame(-(BoxCoxXY - Intercept + AgeFactor + IntakeDay + u2)^2)/(2*Scale)+(Lambda.Value-1)*log(IntakeAmt)
ll <- ll1 + ll2
model IntakeAmt ~ general(ll)

where:

Scale = dispersion value from geeglm and

Lambda.Value = lambda value associated with the maximum log likelihood output from an earlier boxcox() which was used to transform IntakeAmt to BoxCoxXY through the formula Male.Data$BoxCoxXY <- (Male.Data$IntakeAmt^Lambda.Value-1)/Lambda.Value

The random statement in the SAS code is:

random u1 u2 ~ normal([0,0][&vu1,COV_U1U2,&vu2]) subject=RespondentID

so there are two error terms in the model and they are both being fit as random effects. The second square bracket represents the lower triangle of the random-effects variance matrix listed in row order, and is specified using SAS macro variables in the SAS syntax.

The summary of the model that I have been given is the normal one-line overview that shows matrix of covariates (BX) plus an error component, so it's not a lot of help here.

Second update: I realised that I had not removed the RespondentID levels associated with the female subjects as I factorised RespondentID over the entire data frame before I did the split into separate data frames, by gender, for analysis. I have repeated the nlme analysis after removing unused factor levels for RespondentID and I get the same error. The lmer results are the same - which is good to know. :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鱼窥荷 2024-12-30 09:45:45

我刚刚偶然发现了这个悬而未决的问题。如果有人仍然对潜在的解决方案感兴趣：

首先，我不是专家。然而，对于一种“非线性”线性回归，我喜欢使用 GAM，例如来自 mgcv 或 gamm4 包。

伍德，SN (2017)。广义加法模型：R. chapman 和 Hall/CRC 的介绍。

然后，您可以执行类似的操作：

library(mgcv)
mygam <- gamm(BoxCoxXY ~ s(AgeFactor) + s(IntakeDay),
             random = list(RespondentID ~ 1),
             data = Male.Data,
             correlation = corAR1(IntakeDay))

请注意，应考虑重复测量（相关观察），例如，在像我在本示例中包含的相关结构中。此外，平滑（s()）仅适用于连续数据，不适用于随机/离散数据。您可能必须相应地重组数据或包含平滑因子交互（例如：s（连续，by =因子））。
但是，对于您的明确示例，如果这些数据可能的话，您可能必须调整/重新考虑您的数据结构。

但也许这个提示可以帮助您走上正轨？！

I just stumbled over this open question. In case someone is still interested in a potential solution:

First of all, I'm am not an expert. However, for a kind of a "non"-linear regression, I like to use GAMs, e.g., from the mgcv or gamm4 package.

Wood, S. N. (2017). Generalized additive models: an introduction with R. chapman and hall/CRC.

You could then do something comparable like:

library(mgcv)
mygam <- gamm(BoxCoxXY ~ s(AgeFactor) + s(IntakeDay),
             random = list(RespondentID ~ 1),
             data = Male.Data,
             correlation = corAR1(IntakeDay))

Note that a repeated measure (dependent observation) should be considered, e.g., in a correlation structure like I included in this example. Further, the smoothes (s()) only work for continuous data, not for stochastic/discrete. You may have to restructure your data accordingly or include smooth-factor interactions (e.g.: s(continuous, by = factor)).
However, for your explicit example, you probably have to adjust/rethink your data structure, if it is possible with these data at all.

But maybe this hint helps you to get on the right track?!

回复收藏 0 原文

~没有更多了~