模型简化(双向方差分析)

发布于 2025-01-20 01:41:14 字数 1453 浏览 3 评论 0原文

我正在使用方差分析来分析实验结果,以了解我的解释变量(加热和粪动物)对我的响应变量(生物量)是否有任何影响。我首先查看主要效果和交互作用:

full.model <- lm(log(Biomass) ~ Heating*Dungfauna, data= df)
anova(full.model)

我知道有必要完成模型简化,消除不显着的交互作用或效果,以最终达到仍然可以解释结果的最简单模型。我尝试了两种消除交互的方法。但是,当我手动删除交互作用 (Heating*Fauna -> Heating+Fauna) 时,新的方差分析会给出与我使用此模型简化“快捷方式”时不同的输出':

new.model <- update(full.model, .~. -Dungfauna:Heating)
anova(model)

哪种方式是消除交互作用并简化模型的适当方法?

在这两种情况下,数据都会进行对数转换 -

lm(log(CC_noAcari_EmergencePatSoil)~ Dungfauna*Heating, data= biomass)

手动将 Heating*Dungfauna 更改为方差分析输出Heating+Dungfauna

Response: log(CC_noAcari_EmergencePatSoil)

          Df Sum Sq Mean Sq F value    Pr(>F)    
Heating    2  4.806   2.403  5.1799   0.01012 *  
Dungfauna  1 37.734  37.734 81.3432 4.378e-11 ***
Residuals 39 18.091   0.464

使用简化“快捷方式”的方差分析输出:

Response: log(CC_noAcari_EmergencePatSoil)
          Df Sum Sq Mean Sq F value    Pr(>F)   
Dungfauna  1 41.790  41.790 90.0872 1.098e-11 ***
Heating    2  0.750   0.375  0.8079    0.4531    
Residuals 39 18.091   0.464                  

I am using ANOVA to analyse results from an experiment to see whether there are any effects of my explanatory variables (Heating and Dungfauna) on my response variable (Biomass). I started by looking at the main effects and interaction:

full.model <- lm(log(Biomass) ~ Heating*Dungfauna, data= df)
anova(full.model)

I understand that it is necessary to complete model simplification, removing non-significant interactions or effects to eventually reach the simplest model which still explains the results. I tried two ways of removing the interaction. However, when I manually remove the interaction (Heating*Fauna -> Heating+Fauna), the new ANOVA gives a different output to when I use this model simplification 'shortcut':

new.model <- update(full.model, .~. -Dungfauna:Heating)
anova(model)

Which way is the appropriate way to remove the interaction and simplify the model?

In both cases the data is log transformed -

lm(log(CC_noAcari_EmergencePatSoil)~ Dungfauna*Heating, data= biomass)

ANOVA output from manually changing Heating*Dungfauna to Heating+Dungfauna:

Response: log(CC_noAcari_EmergencePatSoil)

          Df Sum Sq Mean Sq F value    Pr(>F)    
Heating    2  4.806   2.403  5.1799   0.01012 *  
Dungfauna  1 37.734  37.734 81.3432 4.378e-11 ***
Residuals 39 18.091   0.464

ANOVA output from using simplification 'shortcut':

Response: log(CC_noAcari_EmergencePatSoil)
          Df Sum Sq Mean Sq F value    Pr(>F)   
Dungfauna  1 41.790  41.790 90.0872 1.098e-11 ***
Heating    2  0.750   0.375  0.8079    0.4531    
Residuals 39 18.091   0.464                  

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

悲念泪 2025-01-27 01:41:14

R 的 anovaaov 函数计算 I 型或“顺序”平方和。指定预测变量的顺序很重要。指定 y ~ A + B 的模型要求以 B 为条件的 A 的效果,而 Y ~ B + A 则要求以 A 为条件的 B 的效果请注意,您的第一个模型指定 Dungfauna*Heating,而您的比较模型使用 Heating+Dungfauna

考虑这个使用“mtcars”数据集的简单示例。这里我指定了两个附加模型(无交互作用)。两个模型指定相同的预测变量,但顺序不同:

add.model <- lm(log(mpg) ~ vs + cyl, data = mtcars)
anova(add.model)

          Df  Sum Sq Mean Sq F value    Pr(>F)    
vs         1 1.22434 1.22434  48.272 1.229e-07 ***
cyl        1 0.78887 0.78887  31.103 5.112e-06 ***
Residuals 29 0.73553 0.02536         

add.model2 <- lm(log(mpg) ~ cyl + vs, data = mtcars)
anova(add.model2)

          Df  Sum Sq Mean Sq F value    Pr(>F)    
cyl        1 2.00795 2.00795 79.1680 8.712e-10 ***
vs         1 0.00526 0.00526  0.2073    0.6523    
Residuals 29 0.73553 0.02536 

您可以使用 car::Anova 指定类型 II 或类型 III 平方和:

car::Anova(add.model, type = 2)
car::Anova(add.model2, type = 2)

这为两个模型提供相同的结果:

           Sum Sq Df F value    Pr(>F)    
vs        0.00526  1  0.2073    0.6523    
cyl       0.78887  1 31.1029 5.112e-06 ***
Residuals 0.73553 29         

summary< /code> 还提供等效(且一致)的指标,无论预测变量的顺序如何,尽管它不是一个正式的方差分析表:

summary(add.model)

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.92108    0.20714  18.930  < 2e-16 ***
vs          -0.04414    0.09696  -0.455    0.652    
cyl         -0.15261    0.02736  -5.577 5.11e-06 ***

R's anova and aov functions compute the Type I or "sequential" sums of squares. The order in which the predictors are specified matters. A model that specifies y ~ A + B is asking for the effect of A conditioned on B, whereas Y ~ B + A is asking for the effect of B conditioned on A. Notice that your first model specifies Dungfauna*Heating, while your comparison model uses Heating+Dungfauna.

Consider this simple example using the "mtcars" data set. Here I specify two additive models (no interactions). Both models specify the same predictors, but in different orders:

add.model <- lm(log(mpg) ~ vs + cyl, data = mtcars)
anova(add.model)

          Df  Sum Sq Mean Sq F value    Pr(>F)    
vs         1 1.22434 1.22434  48.272 1.229e-07 ***
cyl        1 0.78887 0.78887  31.103 5.112e-06 ***
Residuals 29 0.73553 0.02536         

add.model2 <- lm(log(mpg) ~ cyl + vs, data = mtcars)
anova(add.model2)

          Df  Sum Sq Mean Sq F value    Pr(>F)    
cyl        1 2.00795 2.00795 79.1680 8.712e-10 ***
vs         1 0.00526 0.00526  0.2073    0.6523    
Residuals 29 0.73553 0.02536 

You could specify Type II or Type III sums of squares using car::Anova:

car::Anova(add.model, type = 2)
car::Anova(add.model2, type = 2)

Which gives the same result for both models:

           Sum Sq Df F value    Pr(>F)    
vs        0.00526  1  0.2073    0.6523    
cyl       0.78887  1 31.1029 5.112e-06 ***
Residuals 0.73553 29         

summary also provides equivalent (and consistent) metrics regardless of the order of predictors, though it's not quite a formal ANOVA table:

summary(add.model)

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.92108    0.20714  18.930  < 2e-16 ***
vs          -0.04414    0.09696  -0.455    0.652    
cyl         -0.15261    0.02736  -5.577 5.11e-06 ***
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文