如何在 R 中拟合主题为随机的随机效应模型？

发布于 2024-08-04 10:48:00 字数 1983 浏览 5 评论 0原文

给定以下形式的数据，

myDat = structure(list(Score = c(1.84, 2.24, 3.8, 2.3, 3.8, 4.55, 1.13, 
2.49, 3.74, 2.84, 3.3, 4.82, 1.74, 2.89, 3.39, 2.08, 3.99, 4.07, 
1.93, 2.39, 3.63, 2.55, 3.09, 4.76), Subject = c(1L, 1L, 1L, 
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 
7L, 7L, 8L, 8L, 8L), Condition = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 
0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 
1L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Score", 
"Subject", "Condition", "Time"), class = "data.frame", row.names = c(NA, 
-24L))

我想将分数建模为主题、条件和时间的函数。每个（人类）受试者的分数被测量了三次，由变量时间表示，所以我重复测量。

如何在 R 中构建随机效应模型，并将主题效应随机拟合？

附录：有人问我如何生成这些数据。您猜对了，由于白天很长，所以数据是假的。分数是时间加上随机噪声，并且处于条件 1 中会为分数增加一分。作为典型的心理设置，它很有启发性。你有一项任务，要求人们通过练习（时间）和药物（条件==1）来提高分数，从而提高分数。

出于本次讨论的目的，这里有一些更现实的数据。现在，模拟参与者有一个随机的“技能”级别，该级别会添加到他们的分数中。此外，因子现在是字符串。

myDat = structure(list(Score = c(1.62, 2.18, 2.3, 3.46, 3.85, 4.7, 1.41, 
2.21, 3.32, 2.73, 3.34, 3.27, 2.14, 2.73, 2.74, 3.39, 3.59, 4.01, 
1.81, 1.83, 3.22, 3.64, 3.51, 4.26), Subject = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 
6L, 7L, 7L, 7L, 8L, 8L, 8L), .Label = c("A", "B", "C", "D", "E", 
"F", "G", "H"), class = "factor"), Condition = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"), 
    Time = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1PM", 
    "2PM", "3PM"), class = "factor")), .Names = c("Score", "Subject", 
"Condition", "Time"), class = "data.frame", row.names = c(NA, 
-24L))

看看它：

library(ggplot2)
qplot(Time, Score, data = myDat, geom = "line", group = Subject, colour = factor(Condition))

原文

Given data of the following form

myDat = structure(list(Score = c(1.84, 2.24, 3.8, 2.3, 3.8, 4.55, 1.13, 
2.49, 3.74, 2.84, 3.3, 4.82, 1.74, 2.89, 3.39, 2.08, 3.99, 4.07, 
1.93, 2.39, 3.63, 2.55, 3.09, 4.76), Subject = c(1L, 1L, 1L, 
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 
7L, 7L, 8L, 8L, 8L), Condition = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 
0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 
1L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Score", 
"Subject", "Condition", "Time"), class = "data.frame", row.names = c(NA, 
-24L))

I would like to model Score as a function of Subject, Condition and Time. Each (human) Subject's score was measured three times, indicated by the variable Time, so I have repeated measures.

How can I build in R a random effects model with Subject effects fitted as random?

ADDENDUM: It's been asked how I generated these data. You guessed it, the data are fake as the day is long. Score is time plus random noise and being in Condition 1 adds a point to Score. It's instructive as a typical Psych setup. You have a task where people's score gets better with practice (time) and a drug (condition==1) that enhances score.

Here are some more realistic data for the purposes of this discussion. Now simulated participants have a random "skill" level that is added to their scores. Also, the factors are now strings.

myDat = structure(list(Score = c(1.62, 2.18, 2.3, 3.46, 3.85, 4.7, 1.41, 
2.21, 3.32, 2.73, 3.34, 3.27, 2.14, 2.73, 2.74, 3.39, 3.59, 4.01, 
1.81, 1.83, 3.22, 3.64, 3.51, 4.26), Subject = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 
6L, 7L, 7L, 7L, 8L, 8L, 8L), .Label = c("A", "B", "C", "D", "E", 
"F", "G", "H"), class = "factor"), Condition = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"), 
    Time = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1PM", 
    "2PM", "3PM"), class = "factor")), .Names = c("Score", "Subject", 
"Condition", "Time"), class = "data.frame", row.names = c(NA, 
-24L))

See it:

library(ggplot2)
qplot(Time, Score, data = myDat, geom = "line", group = Subject, colour = factor(Condition))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浸婚纱 2024-08-11 10:48:00

使用nlme库...

回答您提出的问题，您可以使用以下代码创建随机截距混合效应模型：

> library(nlme)
> m1 <- lme(Score ~ Condition + Time + Condition*Time,
+ data = myDat, random = ~ 1 | Subject)
> summary(m1)
Linear mixed-effects model fit by REML
 Data: myDat 
       AIC      BIC    logLik
  31.69207 37.66646 -9.846036

Random effects:
 Formula: ~1 | Subject
         (Intercept)  Residual
StdDev: 5.214638e-06 0.3151035

Fixed effects: Score ~ Condition + Time + Condition * Time 
                   Value Std.Error DF  t-value p-value
(Intercept)    0.6208333 0.2406643 14 2.579666  0.0218
Condition      0.7841667 0.3403507  6 2.303996  0.0608
Time           0.9900000 0.1114059 14 8.886423  0.0000
Condition:Time 0.0637500 0.1575517 14 0.404629  0.6919
 Correlation: 
               (Intr) Condtn Time  
Condition      -0.707              
Time           -0.926  0.655       
Condition:Time  0.655 -0.926 -0.707

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.5748794 -0.6704147  0.2069426  0.7467785  1.5153752 

Number of Observations: 24
Number of Groups: 8

截距方差基本上为0，表明没有受试者内效应，因此该模型不能很好地捕获时间之间的关系。随机截距模型很少是重复测量设计所需的模型类型。随机截距模型假设所有时间点之间的相关性相等。即时间 1 和时间 2 之间的相关性与时间 1 和时间 3 之间的相关性相同。在正常情况下（也许不是那些生成假数据的情况），我们预计后者会小于前者。自回归结构通常是更好的方法。

> m2<-gls(Score ~ Condition + Time + Condition*Time,
+ data = myDat, correlation = corAR1(form = ~ Time | Subject))
> summary(m2)
Generalized least squares fit by REML
  Model: Score ~ Condition + Time + Condition * Time 
  Data: myDat 
       AIC      BIC    logLik
  25.45446 31.42886 -6.727232

Correlation Structure: AR(1)
 Formula: ~Time | Subject 
 Parameter estimate(s):
       Phi 
-0.5957973 

Coefficients:
                   Value Std.Error   t-value p-value
(Intercept)    0.6045402 0.1762743  3.429543  0.0027
Condition      0.8058448 0.2492895  3.232566  0.0042
Time           0.9900000 0.0845312 11.711652  0.0000
Condition:Time 0.0637500 0.1195452  0.533271  0.5997

 Correlation: 
               (Intr) Condtn Time  
Condition      -0.707              
Time           -0.959  0.678       
Condition:Time  0.678 -0.959 -0.707

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-1.6850557 -0.6730898  0.2373639  0.8269703  1.5858942 

Residual standard error: 0.2976964 
Degrees of freedom: 24 total; 20 residual

您的数据显示时间点之间的相关性为 -.596，这看起来很奇怪。通常，时间点之间至少应该存在正相关。这些数据是如何生成的？

附录：

根据您的新数据，我们知道数据生成过程相当于随机截距模型（尽管这对于纵向研究来说并不是最现实的。可视化显示时间的影响似乎相当线性，所以我们应该我们可以放心地将其视为数字变量。

> library(nlme)
> m1 <- lme(Score ~ Condition + as.numeric(Time) + Condition*as.numeric(Time),
+ data = myDat, random = ~ 1 | Subject)
> summary(m1)
Linear mixed-effects model fit by REML
 Data: myDat 
       AIC      BIC    logLik
  38.15055 44.12494 -13.07527

Random effects:
 Formula: ~1 | Subject
        (Intercept)  Residual
StdDev:   0.2457355 0.3173421

Fixed effects: Score ~ Condition + as.numeric(Time) + Condition * as.numeric(Time) 
                                  Value Std.Error DF   t-value p-value
(Intercept)                    1.142500 0.2717382 14  4.204415  0.0009
ConditionYes                   1.748333 0.3842958  6  4.549447  0.0039
as.numeric(Time)               0.575000 0.1121974 14  5.124898  0.0002
ConditionYes:as.numeric(Time) -0.197500 0.1586710 14 -1.244714  0.2337
 Correlation: 
                              (Intr) CndtnY as.(T)
ConditionYes                  -0.707              
as.numeric(Time)              -0.826  0.584       
ConditionYes:as.numeric(Time)  0.584 -0.826 -0.707

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.44560367 -0.65018585  0.01864079  0.52930925  1.40824838 

Number of Observations: 24
Number of Groups: 8

我们看到显着的条件效应，表明“是”条件往往具有更高的分数（约 1.7），以及显着的时间效应，表明两组的支持率随着时间的推移而上升。从图中，我们发现两组之间没有时间的差异效应（相互作用），即斜率相同。

using the nlme library...

Answering your stated question, you can create a random intecept mixed effect model using the following code:

> library(nlme)
> m1 <- lme(Score ~ Condition + Time + Condition*Time,
+ data = myDat, random = ~ 1 | Subject)
> summary(m1)
Linear mixed-effects model fit by REML
 Data: myDat 
       AIC      BIC    logLik
  31.69207 37.66646 -9.846036

Random effects:
 Formula: ~1 | Subject
         (Intercept)  Residual
StdDev: 5.214638e-06 0.3151035

Fixed effects: Score ~ Condition + Time + Condition * Time 
                   Value Std.Error DF  t-value p-value
(Intercept)    0.6208333 0.2406643 14 2.579666  0.0218
Condition      0.7841667 0.3403507  6 2.303996  0.0608
Time           0.9900000 0.1114059 14 8.886423  0.0000
Condition:Time 0.0637500 0.1575517 14 0.404629  0.6919
 Correlation: 
               (Intr) Condtn Time  
Condition      -0.707              
Time           -0.926  0.655       
Condition:Time  0.655 -0.926 -0.707

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.5748794 -0.6704147  0.2069426  0.7467785  1.5153752 

Number of Observations: 24
Number of Groups: 8

The intercept variance is basically 0, indicating no within subject effect, so this model is not capturing the between time relationship well. A random intercept model is rarely the type of model you want for a repeated measures design. A random intercept model assumes that the correlations between all time points are equal. i.e. the correlation between time 1 and time 2 is the same as between time 1 and time 3. Under normal circumstances (perhaps not those generating your fake data) we would expect the later to be less than the former. An auto regressive structure is usually a better way to go.

> m2<-gls(Score ~ Condition + Time + Condition*Time,
+ data = myDat, correlation = corAR1(form = ~ Time | Subject))
> summary(m2)
Generalized least squares fit by REML
  Model: Score ~ Condition + Time + Condition * Time 
  Data: myDat 
       AIC      BIC    logLik
  25.45446 31.42886 -6.727232

Correlation Structure: AR(1)
 Formula: ~Time | Subject 
 Parameter estimate(s):
       Phi 
-0.5957973 

Coefficients:
                   Value Std.Error   t-value p-value
(Intercept)    0.6045402 0.1762743  3.429543  0.0027
Condition      0.8058448 0.2492895  3.232566  0.0042
Time           0.9900000 0.0845312 11.711652  0.0000
Condition:Time 0.0637500 0.1195452  0.533271  0.5997

 Correlation: 
               (Intr) Condtn Time  
Condition      -0.707              
Time           -0.959  0.678       
Condition:Time  0.678 -0.959 -0.707

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-1.6850557 -0.6730898  0.2373639  0.8269703  1.5858942 

Residual standard error: 0.2976964 
Degrees of freedom: 24 total; 20 residual

Your data is showing a -.596 between time point correlation, which seems odd. normally there should, at a minimum be a positive correlation between time points. How was this data generated?

addendum:

With your new data we know that the data generating process is equivalent to a random intercept model (though that is not the most realistic for a longitudinal study. The visualization shows that the effect of time seems to be fairly linear, so we should feel comfortable treating it as a numeric variable.

> library(nlme)
> m1 <- lme(Score ~ Condition + as.numeric(Time) + Condition*as.numeric(Time),
+ data = myDat, random = ~ 1 | Subject)
> summary(m1)
Linear mixed-effects model fit by REML
 Data: myDat 
       AIC      BIC    logLik
  38.15055 44.12494 -13.07527

Random effects:
 Formula: ~1 | Subject
        (Intercept)  Residual
StdDev:   0.2457355 0.3173421

Fixed effects: Score ~ Condition + as.numeric(Time) + Condition * as.numeric(Time) 
                                  Value Std.Error DF   t-value p-value
(Intercept)                    1.142500 0.2717382 14  4.204415  0.0009
ConditionYes                   1.748333 0.3842958  6  4.549447  0.0039
as.numeric(Time)               0.575000 0.1121974 14  5.124898  0.0002
ConditionYes:as.numeric(Time) -0.197500 0.1586710 14 -1.244714  0.2337
 Correlation: 
                              (Intr) CndtnY as.(T)
ConditionYes                  -0.707              
as.numeric(Time)              -0.826  0.584       
ConditionYes:as.numeric(Time)  0.584 -0.826 -0.707

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.44560367 -0.65018585  0.01864079  0.52930925  1.40824838 

Number of Observations: 24
Number of Groups: 8

We see a significant Condition effect, indicating that the 'yes' condition tends to have higher scores (by about 1.7), and a significant time effect, indicating that both groups go up over time. Supporting the plot, we find no differential effect of time between the two groups (the interaction). i.e. the slopes are the same.

回复收藏 0 原文

彩扇题诗 2024-08-11 10:48:00

这不是您问题的答案，但您可能会发现这种数据可视化内容提供了丰富的信息。

library(ggplot2)
qplot(Time, Score, data = myDat, geom = "line", 
  group = Subject, colour = factor(Condition))

数据可视化

It's not an answer to your question, but you might find this visualisation of your data informative.

library(ggplot2)
qplot(Time, Score, data = myDat, geom = "line", 
  group = Subject, colour = factor(Condition))

Data visulation

回复收藏 0 原文

眼趣 2024-08-11 10:48:00

（使用lme4库）
这适合您的随机主题效应，也适合您的随机效应分组的变量。在此模型中，随机效应是随受试者变化的截距。

m <- lmer( Score ~ Condition + Time + (1|Subject), data=myDat )

要查看随机效应，您可以使用

ranef(m)

Ian Fellows 提到的那样，您的数据也可能具有随机条件和时间分量。您可以使用另一个模型进行测试。在下面的条件中，时间和截距允许随对象随机变化。它还评估它们的相关性。

mi <- lmer( Score ~ Condition + Time + (Condition + Time|Subject), data=myDat )

并尝试

summary(mi)
ranef(mi)

您还可以在不与截距相关的情况下测试这一点，通过条件和时间之间的相互作用以及许多其他模型来查看哪个最适合您的数据和/或您的理论。您的问题有点模糊，但这几个命令应该可以帮助您入门。

请注意，主题是您的分组因素，因此它是您将其他效果随机放置的内容。这也不是您明确适合作为预测变量的东西。

(using lme4 library)
This fits your subject effect as random and also the variable that your random effects are grouped under. In this model the random effect is the intercept varying by subject.

m <- lmer( Score ~ Condition + Time + (1|Subject), data=myDat )

To see the random effects you can just use

ranef(m)

As Ian Fellows mentioned, your data may also have random Condition and Time components. You can test that with another model. In the one below Condition, Time, and the intercept are allowed to vary randomly by subject. It also evaluates their correlations.

mi <- lmer( Score ~ Condition + Time + (Condition + Time|Subject), data=myDat )

and try

summary(mi)
ranef(mi)

You could also test for this without correlations with the intercept, with interactions between Condition and Time, and numerous other models to see which best fits your data and / or your theory. Your question is a bit vague but these few commands should get you started.

Note that Subject is your grouping factor so it's what you fit other effects as random under. It's not something you then explicitly fit as a predictor as well.

回复收藏 0 原文

~没有更多了~