模型简化（双向方差分析）

发布于 2025-01-20 01:41:14 字数 1453 浏览 3 评论 0原文

我正在使用方差分析来分析实验结果，以了解我的解释变量（加热和粪动物）对我的响应变量（生物量）是否有任何影响。我首先查看主要效果和交互作用：

full.model <- lm(log(Biomass) ~ Heating*Dungfauna, data= df)
anova(full.model)

我知道有必要完成模型简化，消除不显着的交互作用或效果，以最终达到仍然可以解释结果的最简单模型。我尝试了两种消除交互的方法。但是，当我手动删除交互作用 (Heating*Fauna -> Heating+Fauna) 时，新的方差分析会给出与我使用此模型简化“快捷方式”时不同的输出':

new.model <- update(full.model, .~. -Dungfauna:Heating)
anova(model)

哪种方式是消除交互作用并简化模型的适当方法？

在这两种情况下，数据都会进行对数转换 -

lm(log(CC_noAcari_EmergencePatSoil)~ Dungfauna*Heating, data= biomass)

手动将 Heating*Dungfauna 更改为方差分析输出Heating+Dungfauna：

Response: log(CC_noAcari_EmergencePatSoil)

          Df Sum Sq Mean Sq F value    Pr(>F)    
Heating    2  4.806   2.403  5.1799   0.01012 *  
Dungfauna  1 37.734  37.734 81.3432 4.378e-11 ***
Residuals 39 18.091   0.464

使用简化“快捷方式”的方差分析输出：

Response: log(CC_noAcari_EmergencePatSoil)
          Df Sum Sq Mean Sq F value    Pr(>F)   
Dungfauna  1 41.790  41.790 90.0872 1.098e-11 ***
Heating    2  0.750   0.375  0.8079    0.4531    
Residuals 39 18.091   0.464

原文

I am using ANOVA to analyse results from an experiment to see whether there are any effects of my explanatory variables (Heating and Dungfauna) on my response variable (Biomass). I started by looking at the main effects and interaction:

full.model <- lm(log(Biomass) ~ Heating*Dungfauna, data= df)
anova(full.model)

I understand that it is necessary to complete model simplification, removing non-significant interactions or effects to eventually reach the simplest model which still explains the results. I tried two ways of removing the interaction. However, when I manually remove the interaction (Heating*Fauna -> Heating+Fauna), the new ANOVA gives a different output to when I use this model simplification 'shortcut':

new.model <- update(full.model, .~. -Dungfauna:Heating)
anova(model)

Which way is the appropriate way to remove the interaction and simplify the model?

In both cases the data is log transformed -

lm(log(CC_noAcari_EmergencePatSoil)~ Dungfauna*Heating, data= biomass)

ANOVA output from manually changing Heating*Dungfauna to Heating+Dungfauna:

Response: log(CC_noAcari_EmergencePatSoil)

          Df Sum Sq Mean Sq F value    Pr(>F)    
Heating    2  4.806   2.403  5.1799   0.01012 *  
Dungfauna  1 37.734  37.734 81.3432 4.378e-11 ***
Residuals 39 18.091   0.464

ANOVA output from using simplification 'shortcut':

Response: log(CC_noAcari_EmergencePatSoil)
          Df Sum Sq Mean Sq F value    Pr(>F)   
Dungfauna  1 41.790  41.790 90.0872 1.098e-11 ***
Heating    2  0.750   0.375  0.8079    0.4531    
Residuals 39 18.091   0.464

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

悲念泪 2025-01-27 01:41:14

R 的 anova 和 aov 函数计算 I 型或“顺序”平方和。指定预测变量的顺序很重要。指定 y ~ A + B 的模型要求以 B 为条件的 A 的效果，而 Y ~ B + A 则要求以 A 为条件的 B 的效果请注意，您的第一个模型指定 Dungfauna*Heating，而您的比较模型使用 Heating+Dungfauna。

考虑这个使用“mtcars”数据集的简单示例。这里我指定了两个附加模型（无交互作用）。两个模型指定相同的预测变量，但顺序不同：

add.model <- lm(log(mpg) ~ vs + cyl, data = mtcars)
anova(add.model)

          Df  Sum Sq Mean Sq F value    Pr(>F)    
vs         1 1.22434 1.22434  48.272 1.229e-07 ***
cyl        1 0.78887 0.78887  31.103 5.112e-06 ***
Residuals 29 0.73553 0.02536         

add.model2 <- lm(log(mpg) ~ cyl + vs, data = mtcars)
anova(add.model2)

          Df  Sum Sq Mean Sq F value    Pr(>F)    
cyl        1 2.00795 2.00795 79.1680 8.712e-10 ***
vs         1 0.00526 0.00526  0.2073    0.6523    
Residuals 29 0.73553 0.02536

您可以使用 car::Anova 指定类型 II 或类型 III 平方和：

car::Anova(add.model, type = 2)
car::Anova(add.model2, type = 2)

这为两个模型提供相同的结果：

           Sum Sq Df F value    Pr(>F)    
vs        0.00526  1  0.2073    0.6523    
cyl       0.78887  1 31.1029 5.112e-06 ***
Residuals 0.73553 29

summary< /code> 还提供等效（且一致）的指标，无论预测变量的顺序如何，尽管它不是一个正式的方差分析表：

summary(add.model)

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.92108    0.20714  18.930  < 2e-16 ***
vs          -0.04414    0.09696  -0.455    0.652    
cyl         -0.15261    0.02736  -5.577 5.11e-06 ***

R's anova and aov functions compute the Type I or "sequential" sums of squares. The order in which the predictors are specified matters. A model that specifies y ~ A + B is asking for the effect of A conditioned on B, whereas Y ~ B + A is asking for the effect of B conditioned on A. Notice that your first model specifies Dungfauna*Heating, while your comparison model uses Heating+Dungfauna.

Consider this simple example using the "mtcars" data set. Here I specify two additive models (no interactions). Both models specify the same predictors, but in different orders:

add.model <- lm(log(mpg) ~ vs + cyl, data = mtcars)
anova(add.model)

          Df  Sum Sq Mean Sq F value    Pr(>F)    
vs         1 1.22434 1.22434  48.272 1.229e-07 ***
cyl        1 0.78887 0.78887  31.103 5.112e-06 ***
Residuals 29 0.73553 0.02536         

add.model2 <- lm(log(mpg) ~ cyl + vs, data = mtcars)
anova(add.model2)

          Df  Sum Sq Mean Sq F value    Pr(>F)    
cyl        1 2.00795 2.00795 79.1680 8.712e-10 ***
vs         1 0.00526 0.00526  0.2073    0.6523    
Residuals 29 0.73553 0.02536

You could specify Type II or Type III sums of squares using car::Anova:

car::Anova(add.model, type = 2)
car::Anova(add.model2, type = 2)

Which gives the same result for both models:

           Sum Sq Df F value    Pr(>F)    
vs        0.00526  1  0.2073    0.6523    
cyl       0.78887  1 31.1029 5.112e-06 ***
Residuals 0.73553 29

summary also provides equivalent (and consistent) metrics regardless of the order of predictors, though it's not quite a formal ANOVA table:

summary(add.model)

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.92108    0.20714  18.930  < 2e-16 ***
vs          -0.04414    0.09696  -0.455    0.652    
cyl         -0.15261    0.02736  -5.577 5.11e-06 ***

回复收藏 0 原文

~没有更多了~