模型简化(双向方差分析)
我正在使用方差分析来分析实验结果,以了解我的解释变量(加热和粪动物)对我的响应变量(生物量)是否有任何影响。我首先查看主要效果和交互作用:
full.model <- lm(log(Biomass) ~ Heating*Dungfauna, data= df)
anova(full.model)
我知道有必要完成模型简化,消除不显着的交互作用或效果,以最终达到仍然可以解释结果的最简单模型。我尝试了两种消除交互的方法。但是,当我手动删除交互作用 (Heating*Fauna
-> Heating+Fauna
) 时,新的方差分析会给出与我使用此模型简化“快捷方式”时不同的输出':
new.model <- update(full.model, .~. -Dungfauna:Heating)
anova(model)
哪种方式是消除交互作用并简化模型的适当方法?
在这两种情况下,数据都会进行对数转换 -
lm(log(CC_noAcari_EmergencePatSoil)~ Dungfauna*Heating, data= biomass)
手动将 Heating*Dungfauna
更改为方差分析输出Heating+Dungfauna
:
Response: log(CC_noAcari_EmergencePatSoil)
Df Sum Sq Mean Sq F value Pr(>F)
Heating 2 4.806 2.403 5.1799 0.01012 *
Dungfauna 1 37.734 37.734 81.3432 4.378e-11 ***
Residuals 39 18.091 0.464
使用简化“快捷方式”的方差分析输出:
Response: log(CC_noAcari_EmergencePatSoil)
Df Sum Sq Mean Sq F value Pr(>F)
Dungfauna 1 41.790 41.790 90.0872 1.098e-11 ***
Heating 2 0.750 0.375 0.8079 0.4531
Residuals 39 18.091 0.464
I am using ANOVA to analyse results from an experiment to see whether there are any effects of my explanatory variables (Heating and Dungfauna) on my response variable (Biomass). I started by looking at the main effects and interaction:
full.model <- lm(log(Biomass) ~ Heating*Dungfauna, data= df)
anova(full.model)
I understand that it is necessary to complete model simplification, removing non-significant interactions or effects to eventually reach the simplest model which still explains the results. I tried two ways of removing the interaction. However, when I manually remove the interaction (Heating*Fauna
-> Heating+Fauna
), the new ANOVA gives a different output to when I use this model simplification 'shortcut':
new.model <- update(full.model, .~. -Dungfauna:Heating)
anova(model)
Which way is the appropriate way to remove the interaction and simplify the model?
In both cases the data is log transformed -
lm(log(CC_noAcari_EmergencePatSoil)~ Dungfauna*Heating, data= biomass)
ANOVA output from manually changing Heating*Dungfauna
to Heating+Dungfauna
:
Response: log(CC_noAcari_EmergencePatSoil)
Df Sum Sq Mean Sq F value Pr(>F)
Heating 2 4.806 2.403 5.1799 0.01012 *
Dungfauna 1 37.734 37.734 81.3432 4.378e-11 ***
Residuals 39 18.091 0.464
ANOVA output from using simplification 'shortcut':
Response: log(CC_noAcari_EmergencePatSoil)
Df Sum Sq Mean Sq F value Pr(>F)
Dungfauna 1 41.790 41.790 90.0872 1.098e-11 ***
Heating 2 0.750 0.375 0.8079 0.4531
Residuals 39 18.091 0.464
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
R 的
anova
和aov
函数计算 I 型或“顺序”平方和。指定预测变量的顺序很重要。指定y ~ A + B
的模型要求以 B 为条件的 A 的效果,而Y ~ B + A
则要求以 A 为条件的 B 的效果请注意,您的第一个模型指定Dungfauna*Heating
,而您的比较模型使用Heating+Dungfauna
。考虑这个使用“mtcars”数据集的简单示例。这里我指定了两个附加模型(无交互作用)。两个模型指定相同的预测变量,但顺序不同:
您可以使用
car::Anova
指定类型 II 或类型 III 平方和:这为两个模型提供相同的结果:
summary< /code> 还提供等效(且一致)的指标,无论预测变量的顺序如何,尽管它不是一个正式的方差分析表:
R's
anova
andaov
functions compute the Type I or "sequential" sums of squares. The order in which the predictors are specified matters. A model that specifiesy ~ A + B
is asking for the effect of A conditioned on B, whereasY ~ B + A
is asking for the effect of B conditioned on A. Notice that your first model specifiesDungfauna*Heating
, while your comparison model usesHeating+Dungfauna
.Consider this simple example using the "mtcars" data set. Here I specify two additive models (no interactions). Both models specify the same predictors, but in different orders:
You could specify Type II or Type III sums of squares using
car::Anova
:Which gives the same result for both models:
summary
also provides equivalent (and consistent) metrics regardless of the order of predictors, though it's not quite a formal ANOVA table: