R 中合并标准差的现有函数?

发布于 2025-01-03 20:22:19 字数 308 浏览 1 评论 0原文

我有 4 个总体,均值和标准差已知。我想知道总平均值和总标准差。总平均值显然很容易计算,但 R 有一个方便的实用函数,weighted.mean()。是否存在用于组合标准差的类似函数?

计算并不复杂,但现有的函数将使我的代码更清晰、更容易理解。

额外的问题,你用什么工具来搜索这样的函数?我知道它一定就在那里,但我已经做了很多搜索,但找不到它。谢谢!

I have 4 populations with known means and standard deviations. I would like to know the grand mean and grand sd. The grand mean is obviously simple to calculate, but R has a handy utility function, weighted.mean(). Does a similar function exist for combining standard deviations?

The calculation is not complicated, but an existing function would make my code cleaner and easier to understand.

Bonus question, what tools do you use to search for functions like this? I know it must be out there, but I've done a lot of searching and can't find it. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

萧瑟寒风 2025-01-10 20:22:19

我不知道特定的包或函数名称,但从维基百科页面推出您自己的函数似乎很容易。假设人群没有重叠:

## N: vector of sizes
## M: vector of means
## S: vector of standard deviations

grand.mean <- function(M, N) {weighted.mean(M, N)}
grand.sd   <- function(S, M, N) {sqrt(weighted.mean(S^2 + M^2, N) -
                                      weighted.mean(M, N)^2)}

I don't know of a specific package or function name but it seems easy to roll your own function from Wikipedia's page. Assuming no overlap in the populations:

## N: vector of sizes
## M: vector of means
## S: vector of standard deviations

grand.mean <- function(M, N) {weighted.mean(M, N)}
grand.sd   <- function(S, M, N) {sqrt(weighted.mean(S^2 + M^2, N) -
                                      weighted.mean(M, N)^2)}
鸢与 2025-01-10 20:22:19

人口不重叠吗?

library(fishmethods)
combinevar

例如,维基百科中的示例将像这样工作:

xbar <- c(70,65)
s<-c(3,2)
n <- c(1,1)
combinevar(xbar,s,n)

则标准差将为 sqrt(combinevar(xbar,s,n)[2]) ,该函数如下所示:

如果您不想下载该库,

combinevar <- 
function (xbar = NULL, s_squared = NULL, n = NULL) 
{
    if (length(xbar) != length(s_squared) | length(xbar) != length(n) | 
        length(s_squared) != length(n)) 
        stop("Vector lengths are different.")
    sum_of_squares <- sum((n - 1) * s_squared + n * xbar^2)
    grand_mean <- sum(n * xbar)/sum(n)
    combined_var <- (sum_of_squares - sum(n) * grand_mean^2)/(sum(n) - 
        1)
    return(c(grand_mean, combined_var))
}

Are the populations non overlapping?

library(fishmethods)
combinevar

For instance the example in wikipedia would work like this:

xbar <- c(70,65)
s<-c(3,2)
n <- c(1,1)
combinevar(xbar,s,n)

and standard deviation would be sqrt(combinevar(xbar,s,n)[2])

if you don't want to download the library the function goes like this:

combinevar <- 
function (xbar = NULL, s_squared = NULL, n = NULL) 
{
    if (length(xbar) != length(s_squared) | length(xbar) != length(n) | 
        length(s_squared) != length(n)) 
        stop("Vector lengths are different.")
    sum_of_squares <- sum((n - 1) * s_squared + n * xbar^2)
    grand_mean <- sum(n * xbar)/sum(n)
    combined_var <- (sum_of_squares - sum(n) * grand_mean^2)/(sum(n) - 
        1)
    return(c(grand_mean, combined_var))
}
天生の放荡 2025-01-10 20:22:19

使用 utilities 包中的

sample.decomp 函数 这种统计问题现在已在 实用程序。该函数可以根据子组矩计算合并样本矩,或者根据其他子组矩和合并矩计算缺失的子组矩。它适用于四阶分解,即样本大小、样本均值、样本方差/标准差、样本偏度和样本峰度的分解。


如何使用该函数:这里我们给出一个示例,其中我们使用该函数来计算由四个子组组成的合并样本的样本矩。为此,我们首先生成一个模拟数据集 DATA,其中包含四个大小不等的子组,并将它们汇集为单个数据集 POOL。子组和合并样本的矩可以使用同一包中的moments函数获得。

#Create some subgroups of mock data and a pooled dataset
set.seed(1)
N    <- c(28, 44, 51, 102)
SUB1 <- rnorm(N[1])
SUB2 <- rnorm(N[2])
SUB3 <- rnorm(N[3])
SUB4 <- rnorm(N[4])
DATA <- list(SUB1 = SUB1, SUB2 = SUB2, SUB3 = SUB3, SUB4 = SUB4)
POOL <- c(SUB1, SUB2, SUB3, SUB4)

#Show sample statistics for the subgroups
library(utilities)
moments(DATA)

       n sample.mean sample.var sample.skew sample.kurt NAs
SUB1  28  0.09049834  0.9013829  -0.7648008    3.174128   0
SUB2  44  0.18637936  0.8246700   0.3653918    3.112901   0
SUB3  51  0.05986594  0.6856030   0.3076281    2.306243   0
SUB4 102 -0.05135660  1.0526184   0.3348429    2.741974   0

#Show sample statistics for the pooled sample
moments(POOL)

       n sample.mean sample.var sample.skew sample.kurt NAs
POOL 225  0.03799749  0.9030244   0.1705622    2.828833   0

现在我们已经有了子组的矩集,我们可以使用 sample.decomp 函数从子组样本矩中获取合并样本矩。作为此函数的输入,您可以使用子组的 moments 输出,也可以分别输入样本大小和样本矩作为向量(这里我们将使用后者)。正如您所看到的,这为合并样本提供了与根据基础数据直接计算相同的样本矩。

#Compute sample statistics for subgroups
library(utilities)
MEAN   <- c(mean(SUB1), mean(SUB2), mean(SUB3), mean(SUB4))
VAR    <- c( var(SUB1),  var(SUB2),  var(SUB3),  var(SUB4))

#Compute sample decomposition
sample.decomp(n = N, sample.mean = MEAN, sample.var  = VAR, names = names(DATA))

             n sample.mean sample.var
SUB1        28  0.09049834  0.9013829
SUB2        44  0.18637936  0.8246700
SUB3        51  0.05986594  0.6856030
SUB4       102 -0.05135660  1.0526184
--pooled-- 225  0.03799749  0.9030244

如您所见,sample.decomp 函数允许计算合并样本方差。您可以在包文档中阅读有关此功能的信息。

Use the sample.decomp function in the utilities package

Statistical problems of this kind have now been automated in the sample.decomp function in the utilities package. This function can compute pooled sample moments from subgroup moments, or compute missing subgroup moments from the other subgroup moments and pooled moments. It works for decompositions up to fourth order ---i.e., decompositions of sample size, sample mean, sample variance/standard deviation, sample skewness, and sample kurtosis.


How to use the function: Here we give an example where we use the function to compute the sample moments of a pooled sample composed of four subgroups. To do this, we first generate a mock dataset DATA containing four subgroups with unequal sizes, and we pool these as the single dataset POOL. The moments of the subgroups and the pooled sample can be obtained using the moments function in the same package.

#Create some subgroups of mock data and a pooled dataset
set.seed(1)
N    <- c(28, 44, 51, 102)
SUB1 <- rnorm(N[1])
SUB2 <- rnorm(N[2])
SUB3 <- rnorm(N[3])
SUB4 <- rnorm(N[4])
DATA <- list(SUB1 = SUB1, SUB2 = SUB2, SUB3 = SUB3, SUB4 = SUB4)
POOL <- c(SUB1, SUB2, SUB3, SUB4)

#Show sample statistics for the subgroups
library(utilities)
moments(DATA)

       n sample.mean sample.var sample.skew sample.kurt NAs
SUB1  28  0.09049834  0.9013829  -0.7648008    3.174128   0
SUB2  44  0.18637936  0.8246700   0.3653918    3.112901   0
SUB3  51  0.05986594  0.6856030   0.3076281    2.306243   0
SUB4 102 -0.05135660  1.0526184   0.3348429    2.741974   0

#Show sample statistics for the pooled sample
moments(POOL)

       n sample.mean sample.var sample.skew sample.kurt NAs
POOL 225  0.03799749  0.9030244   0.1705622    2.828833   0

Now that we have set of moments for subgroups, we can use the sample.decomp function to obtain the pooled sample moments from the subgroup sample moments. As an input to this function you can either use the moments output for the subgroups or you can input the sample sizes and sample moments separately as vectors (here we will do the latter). As you can see, this gives the same sample moments for the pooled sample as direct computation from the underlying data.

#Compute sample statistics for subgroups
library(utilities)
MEAN   <- c(mean(SUB1), mean(SUB2), mean(SUB3), mean(SUB4))
VAR    <- c( var(SUB1),  var(SUB2),  var(SUB3),  var(SUB4))

#Compute sample decomposition
sample.decomp(n = N, sample.mean = MEAN, sample.var  = VAR, names = names(DATA))

             n sample.mean sample.var
SUB1        28  0.09049834  0.9013829
SUB2        44  0.18637936  0.8246700
SUB3        51  0.05986594  0.6856030
SUB4       102 -0.05135660  1.0526184
--pooled-- 225  0.03799749  0.9030244

As you can see, the sample.decomp function allows computation of the pooled sample variance. You can read about this function in the package documentation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文