nlme 错误“组的公式无效”尽管指定了随机效应
我对此进行了一些搜索,但我发现的邮件列表帖子与未在 nlme
中指定随机效果的人相关,而我已经这样做了。我还拥有 Pinheiro 和 Bates 撰写的《S 和 S-Plus 中的混合效应模型》一书,但无法从书中解决我的问题。
我仍在进行营养数据分析,现在已转向实际数据。这些数据来自一项人口调查,并采用重复测量设计,因为每个受访者都有两次 24 小时内对该营养素的摄入量回忆。
我已经成功地将 lme4 模型拟合到我的数据中,现在我试图找出如果我使用非线性方法会发生什么。我的数据快照如下:
head(Male.Data)
RespondentID Age SampleWeight IntakeDay IntakeAmt AgeFactor BoxCoxXY
2 100020 12 0.4952835 Day1Intake 12145.852 9to13 15.61196
7 100419 14 0.3632839 Day1Intake 9591.953 14to18 15.01444
8 100459 11 0.4952835 Day1Intake 7838.713 9to13 14.51458
12 101138 15 1.3258785 Day1Intake 11113.266 14to18 15.38541
14 101214 6 2.1198688 Day1Intake 7150.133 4to8 14.29022
18 101389 5 2.1198688 Day1Intake 5091.528 4to8 13.47928
有关数据的摘要信息为:
str(Male.Data)
'data.frame': 4498 obs. of 7 variables:
$ RespondentID: Factor w/ 4487 levels "100013","100020",..: 2 7 8 12 14 18 19 20 21 22 ...
$ Age : int 12 14 11 15 6 5 10 2 2 9 ...
$ SampleWeight: num 0.495 0.363 0.495 1.326 2.12 ...
$ IntakeDay : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
$ IntakeAmt : num 12146 9592 7839 11113 7150 ...
$ AgeFactor : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
$ BoxCoxXY : num 15.6 15 14.5 15.4 14.3 ...
使用lme4
包,我成功地拟合了线性混合效应模型(随机效应来自受试者和>IntakeDay
是与 BoxCoxXY
相关的重复测量因子,它是 IntakeAmt
的变换):
Male.lme1 <- lmer(BoxCoxXY ~ AgeFactor + IntakeDay + (1|RespondentID),
data = Male.Data,
weights = SampleWeight)
我一直在尝试使用 nlme
code> 包来查看拟合非线性模型比较两者,但我无法让我的语法工作。我最初的问题是,我的数据似乎没有相关的 SelfStart 模型,因此我使用 geeglm 生成起始值(系数保存到名为 Male.nlme.start 的数据框中) )。但现在我收到错误:
Error in getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]), :
Invalid formula for groups
我无法弄清楚我做错了什么,我使用的 nlme
语法是:
Male.nlme1 <- nlme(BoxCoxXY ~ AgeFactor + IntakeDay + RespondentID , data = Male.Data,
fixed = AgeFactor + IntakeDay ~ 1,
random = RespondentID ~ 1,
start=c(Male.nlme.start[,"Estimate"]))
我已经尝试过使用和不使用 RespondentID
的分析code> 包含在整体模型规范中,这似乎没有影响。
我尝试坚持使用非线性方法的原因是 SAS 中的原始分析使用了非线性方法。虽然我的残差等从 lme 分析中看起来不错,但我很好奇非线性方法会产生什么影响。
如果有帮助的话,上次分析尝试的 traceback()
结果(其中包括 RespondentID
)是:
5: stop("Invalid formula for groups")
4: getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]),
sep = "|"))))
3: getGroups(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]),
sep = "|"))))
2: nlme.formula(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data,
fixed = AgeFactor + IntakeDay ~ 1, random = RespondentID ~
1, start = c(Male.nlme.start[, "Estimate"]))
1: nlme(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data, fixed = AgeFactor +
IntakeDay ~ 1, random = RespondentID ~ 1, start = c(Male.nlme.start[,
"Estimate"]))
任何人都可以建议我哪里出了问题吗?我开始怀疑是否 (1) RespondentID
的因素级别太多,无法在 nlme
中工作,或者 (2) 该方法仅在我提供RespondentID
的启动参数,这对于我拥有的数据来说似乎毫无意义,因为这是我的主题标识符。
更新:为了回答 Ben,SAS nlmixed
模型是固定效应的通用对数似然函数:
ll1 <- log(1/sqrt(2*pi*Scale))
ll2 <- as.data.frame(-(BoxCoxXY - Intercept + AgeFactor + IntakeDay + u2)^2)/(2*Scale)+(Lambda.Value-1)*log(IntakeAmt)
ll <- ll1 + ll2
model IntakeAmt ~ general(ll)
其中:
Scale
= 来自 geeglm
的离散值
Lambda.Value = 与早期 boxcox()
输出的最大对数似然相关的 lambda 值,该值用于将 IntakeAmt
转换为 BoxCoxXY
通过公式Male.Data$BoxCoxXY <- (Male.Data$IntakeAmt^Lambda.Value-1)/Lambda.Value
SAS代码中的random
语句是:
random u1 u2 ~ normal([0,0][&vu1,COV_U1U2,&vu2]) subject=RespondentID
所以有是模型中的两个误差项,它们都作为随机效应进行拟合。第二个方括号表示按行顺序列出的随机效应方差矩阵的下三角,并使用 SAS 语法中的 SAS 宏变量指定。
我得到的模型摘要是正常的一行概述,显示协变量矩阵 (BX) 加上误差分量,因此这里没有太多帮助。
第二次更新:我意识到我没有删除与女性受试者相关的 RespondentID 级别,因为在按性别将 RespondentID 分解为单独的数据框进行分析之前,我在整个数据框中分解了 RespondentID。在删除 RespondentID 的未使用因子水平后,我重复了 nlme
分析,但出现了相同的错误。 lmer
结果是相同的 - 很高兴知道这一点。 :)
I have done some searching for this, but the mailing list posts I have found are associated with the person not specifying a random effect in nlme
whereas I have done this. I also own the book Mixed Effect Models in S and S-Plus by Pinheiro and Bates, but can't work out my problem from the book.
I'm still working on my nutrient data analysis, and have now shifted onto real data. The data come from a population survey, and feature a repeated measures design as each respondent has two 24-hour intake recalls for the nutrient.
I have successfully fit a lme4 model to my data, and now I am trying to find out what happens if I use a nonlinear method instead. A snapshot of my data is below:
head(Male.Data)
RespondentID Age SampleWeight IntakeDay IntakeAmt AgeFactor BoxCoxXY
2 100020 12 0.4952835 Day1Intake 12145.852 9to13 15.61196
7 100419 14 0.3632839 Day1Intake 9591.953 14to18 15.01444
8 100459 11 0.4952835 Day1Intake 7838.713 9to13 14.51458
12 101138 15 1.3258785 Day1Intake 11113.266 14to18 15.38541
14 101214 6 2.1198688 Day1Intake 7150.133 4to8 14.29022
18 101389 5 2.1198688 Day1Intake 5091.528 4to8 13.47928
And the summary information about the data is:
str(Male.Data)
'data.frame': 4498 obs. of 7 variables:
$ RespondentID: Factor w/ 4487 levels "100013","100020",..: 2 7 8 12 14 18 19 20 21 22 ...
$ Age : int 12 14 11 15 6 5 10 2 2 9 ...
$ SampleWeight: num 0.495 0.363 0.495 1.326 2.12 ...
$ IntakeDay : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
$ IntakeAmt : num 12146 9592 7839 11113 7150 ...
$ AgeFactor : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
$ BoxCoxXY : num 15.6 15 14.5 15.4 14.3 ...
Using the lme4
package, I have successfully fit a linear mixed effects model using (the random effect is from the subjects and IntakeDay
is the repeated measure factor associated with BoxCoxXY
, which is a transform of IntakeAmt
):
Male.lme1 <- lmer(BoxCoxXY ~ AgeFactor + IntakeDay + (1|RespondentID),
data = Male.Data,
weights = SampleWeight)
I have been trying to use the nlme
package to look at fitting a nonlinear model to compare the two, but I cannot get my syntax to work. My initial problem was that there does not seem to be a relevant SelfStart model for my data, so I used geeglm
to generate starting values (coefficients saved to a data frame called Male.nlme.start
). But now I just get the error:
Error in getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]), :
Invalid formula for groups
I can't work out what I am doing wrong, the nlme
syntax I am using is:
Male.nlme1 <- nlme(BoxCoxXY ~ AgeFactor + IntakeDay + RespondentID , data = Male.Data,
fixed = AgeFactor + IntakeDay ~ 1,
random = RespondentID ~ 1,
start=c(Male.nlme.start[,"Estimate"]))
I have tried the analysis both with and without RespondentID
being included in the overall model specification, and this seems to have no impact.
The reason I am trying to persevere with the nonlinear method is that the original analysis in SAS used a nonlinear approach. While my residuals etc look acceptably good from the lme analysis, I am curious to see what impact a nonlinear approach would have.
In case it is helpful, the traceback()
results from the last analysis attempt, which includes RespondentID
is:
5: stop("Invalid formula for groups")
4: getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]),
sep = "|"))))
3: getGroups(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]),
sep = "|"))))
2: nlme.formula(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data,
fixed = AgeFactor + IntakeDay ~ 1, random = RespondentID ~
1, start = c(Male.nlme.start[, "Estimate"]))
1: nlme(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data, fixed = AgeFactor +
IntakeDay ~ 1, random = RespondentID ~ 1, start = c(Male.nlme.start[,
"Estimate"]))
Can anyone suggest where I have gone wrong? I'm starting to wonder if either (1) there are too many factor levels for RespondentID
to work in nlme
or (2) the method will only work if I supply a start parameter for RespondentID
, which seems nonsensical with the data I have as this is my subject identifier.
Update: to answer Ben, the SAS nlmixed
model is a general log likelihood function for the fixed effects:
ll1 <- log(1/sqrt(2*pi*Scale))
ll2 <- as.data.frame(-(BoxCoxXY - Intercept + AgeFactor + IntakeDay + u2)^2)/(2*Scale)+(Lambda.Value-1)*log(IntakeAmt)
ll <- ll1 + ll2
model IntakeAmt ~ general(ll)
where:
Scale
= dispersion value from geeglm
and
Lambda.Value
= lambda value associated with the maximum log likelihood output from an earlier boxcox()
which was used to transform IntakeAmt
to BoxCoxXY
through the formula Male.Data$BoxCoxXY <- (Male.Data$IntakeAmt^Lambda.Value-1)/Lambda.Value
The random
statement in the SAS code is:
random u1 u2 ~ normal([0,0][&vu1,COV_U1U2,&vu2]) subject=RespondentID
so there are two error terms in the model and they are both being fit as random effects. The second square bracket represents the lower triangle of the random-effects variance matrix listed in row order, and is specified using SAS macro variables in the SAS syntax.
The summary of the model that I have been given is the normal one-line overview that shows matrix of covariates (BX) plus an error component, so it's not a lot of help here.
Second update: I realised that I had not removed the RespondentID levels associated with the female subjects as I factorised RespondentID over the entire data frame before I did the split into separate data frames, by gender, for analysis. I have repeated the nlme
analysis after removing unused factor levels for RespondentID and I get the same error. The lmer
results are the same - which is good to know. :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我刚刚偶然发现了这个悬而未决的问题。如果有人仍然对潜在的解决方案感兴趣:
首先,我不是专家。然而,对于一种“非线性”线性回归,我喜欢使用 GAM,例如来自
mgcv
或gamm4
包。然后,您可以执行类似的操作:
请注意,应考虑重复测量(相关观察),例如,在像我在本示例中包含的相关结构中。此外,平滑(
s()
)仅适用于连续数据,不适用于随机/离散数据。您可能必须相应地重组数据或包含平滑因子交互(例如:s(连续,by =因子)
)。但是,对于您的明确示例,如果这些数据可能的话,您可能必须调整/重新考虑您的数据结构。
但也许这个提示可以帮助您走上正轨?!
I just stumbled over this open question. In case someone is still interested in a potential solution:
First of all, I'm am not an expert. However, for a kind of a "non"-linear regression, I like to use GAMs, e.g., from the
mgcv
orgamm4
package.You could then do something comparable like:
Note that a repeated measure (dependent observation) should be considered, e.g., in a correlation structure like I included in this example. Further, the smoothes (
s()
) only work for continuous data, not for stochastic/discrete. You may have to restructure your data accordingly or include smooth-factor interactions (e.g.:s(continuous, by = factor)
).However, for your explicit example, you probably have to adjust/rethink your data structure, if it is possible with these data at all.
But maybe this hint helps you to get on the right track?!