如何在 R 中拟合主题为随机的随机效应模型?
给定以下形式的数据,
myDat = structure(list(Score = c(1.84, 2.24, 3.8, 2.3, 3.8, 4.55, 1.13,
2.49, 3.74, 2.84, 3.3, 4.82, 1.74, 2.89, 3.39, 2.08, 3.99, 4.07,
1.93, 2.39, 3.63, 2.55, 3.09, 4.76), Subject = c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L,
7L, 7L, 8L, 8L, 8L), Condition = c(0L, 0L, 0L, 1L, 1L, 1L, 0L,
0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,
1L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Score",
"Subject", "Condition", "Time"), class = "data.frame", row.names = c(NA,
-24L))
我想将分数建模为主题、条件和时间的函数。每个(人类)受试者的分数被测量了三次,由变量时间表示,所以我重复测量。
如何在 R 中构建随机效应模型,并将主题效应随机拟合?
附录:有人问我如何生成这些数据。您猜对了,由于白天很长,所以数据是假的。分数是时间加上随机噪声,并且处于条件 1 中会为分数增加一分。作为典型的心理设置,它很有启发性。你有一项任务,要求人们通过练习(时间)和药物(条件==1)来提高分数,从而提高分数。
出于本次讨论的目的,这里有一些更现实的数据。现在,模拟参与者有一个随机的“技能”级别,该级别会添加到他们的分数中。此外,因子现在是字符串。
myDat = structure(list(Score = c(1.62, 2.18, 2.3, 3.46, 3.85, 4.7, 1.41,
2.21, 3.32, 2.73, 3.34, 3.27, 2.14, 2.73, 2.74, 3.39, 3.59, 4.01,
1.81, 1.83, 3.22, 3.64, 3.51, 4.26), Subject = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L, 8L, 8L, 8L), .Label = c("A", "B", "C", "D", "E",
"F", "G", "H"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"),
Time = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1PM",
"2PM", "3PM"), class = "factor")), .Names = c("Score", "Subject",
"Condition", "Time"), class = "data.frame", row.names = c(NA,
-24L))
看看它:
library(ggplot2)
qplot(Time, Score, data = myDat, geom = "line", group = Subject, colour = factor(Condition))
Given data of the following form
myDat = structure(list(Score = c(1.84, 2.24, 3.8, 2.3, 3.8, 4.55, 1.13,
2.49, 3.74, 2.84, 3.3, 4.82, 1.74, 2.89, 3.39, 2.08, 3.99, 4.07,
1.93, 2.39, 3.63, 2.55, 3.09, 4.76), Subject = c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L,
7L, 7L, 8L, 8L, 8L), Condition = c(0L, 0L, 0L, 1L, 1L, 1L, 0L,
0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,
1L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Score",
"Subject", "Condition", "Time"), class = "data.frame", row.names = c(NA,
-24L))
I would like to model Score as a function of Subject, Condition and Time. Each (human) Subject's score was measured three times, indicated by the variable Time, so I have repeated measures.
How can I build in R a random effects model with Subject effects fitted as random?
ADDENDUM: It's been asked how I generated these data. You guessed it, the data are fake as the day is long. Score is time plus random noise and being in Condition 1 adds a point to Score. It's instructive as a typical Psych setup. You have a task where people's score gets better with practice (time) and a drug (condition==1) that enhances score.
Here are some more realistic data for the purposes of this discussion. Now simulated participants have a random "skill" level that is added to their scores. Also, the factors are now strings.
myDat = structure(list(Score = c(1.62, 2.18, 2.3, 3.46, 3.85, 4.7, 1.41,
2.21, 3.32, 2.73, 3.34, 3.27, 2.14, 2.73, 2.74, 3.39, 3.59, 4.01,
1.81, 1.83, 3.22, 3.64, 3.51, 4.26), Subject = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L, 8L, 8L, 8L), .Label = c("A", "B", "C", "D", "E",
"F", "G", "H"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"),
Time = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1PM",
"2PM", "3PM"), class = "factor")), .Names = c("Score", "Subject",
"Condition", "Time"), class = "data.frame", row.names = c(NA,
-24L))
See it:
library(ggplot2)
qplot(Time, Score, data = myDat, geom = "line", group = Subject, colour = factor(Condition))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用nlme库...
回答您提出的问题,您可以使用以下代码创建随机截距混合效应模型:
截距方差基本上为0,表明没有受试者内效应,因此该模型不能很好地捕获时间之间的关系。随机截距模型很少是重复测量设计所需的模型类型。随机截距模型假设所有时间点之间的相关性相等。即时间 1 和时间 2 之间的相关性与时间 1 和时间 3 之间的相关性相同。在正常情况下(也许不是那些生成假数据的情况),我们预计后者会小于前者。自回归结构通常是更好的方法。
您的数据显示时间点之间的相关性为 -.596,这看起来很奇怪。通常,时间点之间至少应该存在正相关。这些数据是如何生成的?
附录:
根据您的新数据,我们知道数据生成过程相当于随机截距模型(尽管这对于纵向研究来说并不是最现实的。可视化显示时间的影响似乎相当线性,所以我们应该我们可以放心地将其视为数字变量。
我们看到显着的条件效应,表明“是”条件往往具有更高的分数(约 1.7),以及显着的时间效应,表明两组的支持率随着时间的推移而上升。从图中,我们发现两组之间没有时间的差异效应(相互作用),即斜率相同。
using the nlme library...
Answering your stated question, you can create a random intecept mixed effect model using the following code:
The intercept variance is basically 0, indicating no within subject effect, so this model is not capturing the between time relationship well. A random intercept model is rarely the type of model you want for a repeated measures design. A random intercept model assumes that the correlations between all time points are equal. i.e. the correlation between time 1 and time 2 is the same as between time 1 and time 3. Under normal circumstances (perhaps not those generating your fake data) we would expect the later to be less than the former. An auto regressive structure is usually a better way to go.
Your data is showing a -.596 between time point correlation, which seems odd. normally there should, at a minimum be a positive correlation between time points. How was this data generated?
addendum:
With your new data we know that the data generating process is equivalent to a random intercept model (though that is not the most realistic for a longitudinal study. The visualization shows that the effect of time seems to be fairly linear, so we should feel comfortable treating it as a numeric variable.
We see a significant Condition effect, indicating that the 'yes' condition tends to have higher scores (by about 1.7), and a significant time effect, indicating that both groups go up over time. Supporting the plot, we find no differential effect of time between the two groups (the interaction). i.e. the slopes are the same.
这不是您问题的答案,但您可能会发现这种数据可视化内容提供了丰富的信息。
It's not an answer to your question, but you might find this visualisation of your data informative.
(使用lme4库)
这适合您的随机主题效应,也适合您的随机效应分组的变量。在此模型中,随机效应是随受试者变化的截距。
要查看随机效应,您可以使用
Ian Fellows 提到的那样,您的数据也可能具有随机条件和时间分量。您可以使用另一个模型进行测试。在下面的条件中,时间和截距允许随对象随机变化。它还评估它们的相关性。
并尝试
您还可以在不与截距相关的情况下测试这一点,通过条件和时间之间的相互作用以及许多其他模型来查看哪个最适合您的数据和/或您的理论。您的问题有点模糊,但这几个命令应该可以帮助您入门。
请注意,主题是您的分组因素,因此它是您将其他效果随机放置的内容。这也不是您明确适合作为预测变量的东西。
(using lme4 library)
This fits your subject effect as random and also the variable that your random effects are grouped under. In this model the random effect is the intercept varying by subject.
To see the random effects you can just use
As Ian Fellows mentioned, your data may also have random Condition and Time components. You can test that with another model. In the one below Condition, Time, and the intercept are allowed to vary randomly by subject. It also evaluates their correlations.
and try
You could also test for this without correlations with the intercept, with interactions between Condition and Time, and numerous other models to see which best fits your data and / or your theory. Your question is a bit vague but these few commands should get you started.
Note that Subject is your grouping factor so it's what you fit other effects as random under. It's not something you then explicitly fit as a predictor as well.