R计算子集中所有子组的均方

发布于 2025-01-12 03:00:38 字数 706 浏览 1 评论 0原文

如何使用“值”列计算所有 2019_Preston_STD、2019_Preston_V1、2019_Preston_V2 等的均方,然后使用 adjmth1、adjmth3 列

structure(list(IDX = c("2019_Preston_STD", "2019_Preston_V1", 
"2019_Preston_V2", "2019_Preston_V3", "2019_Preston_W1", "2019_Preston_W2"
), Value = c(3L, 2L, 3L, 2L, 3L, 5L), adjmth1 = c(2.87777777777778, 
1.85555555555556, 2.01111111111111, 1.77777777777778, 3.62222222222222, 
4.45555555555556), adjmth3 = c(2.9328763348507, 2.08651828334684, 
2.80282946626847, 2.15028039284054, 2.68766916156347, 4.51425274916654
), adjmth13 = c(2.81065411262847, 1.82585524933201, 1.81394057737959, 
1.40785681078568, 3.30989138378569, 4.7301083495049)), row.names = 29:34, class = "data.frame")

how do I calculate the mean square of all 2019_Preston_STD,2019_Preston_V1,2019_Preston_V2 etc using the Value column, then the adjmth1, adjmth3 columns

structure(list(IDX = c("2019_Preston_STD", "2019_Preston_V1", 
"2019_Preston_V2", "2019_Preston_V3", "2019_Preston_W1", "2019_Preston_W2"
), Value = c(3L, 2L, 3L, 2L, 3L, 5L), adjmth1 = c(2.87777777777778, 
1.85555555555556, 2.01111111111111, 1.77777777777778, 3.62222222222222, 
4.45555555555556), adjmth3 = c(2.9328763348507, 2.08651828334684, 
2.80282946626847, 2.15028039284054, 2.68766916156347, 4.51425274916654
), adjmth13 = c(2.81065411262847, 1.82585524933201, 1.81394057737959, 
1.40785681078568, 3.30989138378569, 4.7301083495049)), row.names = 29:34, class = "data.frame")

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

有木有妳兜一样 2025-01-19 03:00:38

此任务可以通过多种方式完成,如 @r2evans 指出的链接所示。我最喜欢的是使用 summarize(across()) 的 dplyr,因为对我来说,它的语法很容易理解并且易于应用于许多列。它还以漂亮的方式显示结果数字 。

例如,从iris数据中,我想获得Sepal.LengthPetal.Length的算术mean > 和 Petal.Width对于每个物种:setosa、versicolor 和 virginica 这是数据的头部:

head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

以下是如何获取每个物种的平均值:

iris %>% group_by(Species) %>% 
         summarize(across(c(Sepal.Length, Petal.Length, Petal.Width), mean))
# A tibble: 3 x 4
# Species    Sepal.Length Petal.Length Petal.Width
# <fct>             <dbl>        <dbl>       <dbl>
# 1 setosa             5.01         1.46       0.246
# 2 versicolor         5.94         4.26       1.33 
# 3 virginica          6.59         5.55       2.03 

至于您的任务,首先您需要定义平均值的函数。 square(因为它的定义在某些参考文献中略有不同),然后,您使用 summarize(across()) 将其应用到数据框,

例如,您将均方函数定义为如下:

meansq <- function(x) sum((x-mean(x))^2)/(length(x)-1)

注意:该定义需要length(x) 不等于 1,否则将生成 NaN。

您可以将其应用到数据框 newdata 中,如下所示:

newdata %>% group_by(IDX) %>% 
            summarize(across(c(Value, adjmth1, adjmth3), meansq)

This task can be done in many ways, as shown in the link that @r2evans pointed out. My favorite one is dplyr using summarize(across() because to me its syntax is easy to understand and easy to apply to many columns. It also presents the resulted numbers in nice format.

For example, from iris data I want to get the arithmetic mean of Sepal.Length, Petal.Length, and Petal.Width for each of species : setosa, versicolor, and virginica. Here is the head of the data:

head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

And here is how to get the mean in each species:

iris %>% group_by(Species) %>% 
         summarize(across(c(Sepal.Length, Petal.Length, Petal.Width), mean))
# A tibble: 3 x 4
# Species    Sepal.Length Petal.Length Petal.Width
# <fct>             <dbl>        <dbl>       <dbl>
# 1 setosa             5.01         1.46       0.246
# 2 versicolor         5.94         4.26       1.33 
# 3 virginica          6.59         5.55       2.03 

As for your task, first you need to define the function for the mean square (because its definition slightly varies in some references). Then, you apply it to your data frame using summarize(across()).

For example, you define the mean square function as follows:

meansq <- function(x) sum((x-mean(x))^2)/(length(x)-1)

Note: This definition requires that length(x) doesn't equal 1, or otherwise NaN will be produced.

You can apply it to your data frame newdata as follows:

newdata %>% group_by(IDX) %>% 
            summarize(across(c(Value, adjmth1, adjmth3), meansq)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文