当前位置：文江博客话题详情

R 中合并标准差的现有函数？

发布于 2025-01-03 20:22:19 字数 308 浏览 1 评论 0原文

我有 4 个总体，均值和标准差已知。我想知道总平均值和总标准差。总平均值显然很容易计算，但 R 有一个方便的实用函数，weighted.mean()。是否存在用于组合标准差的类似函数？

计算并不复杂，但现有的函数将使我的代码更清晰、更容易理解。

额外的问题，你用什么工具来搜索这样的函数？我知道它一定就在那里，但我已经做了很多搜索，但找不到它。谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萧瑟寒风 2025-01-10 20:22:19

我不知道特定的包或函数名称，但从维基百科页面推出您自己的函数似乎很容易。假设人群没有重叠：

## N: vector of sizes
## M: vector of means
## S: vector of standard deviations

grand.mean <- function(M, N) {weighted.mean(M, N)}
grand.sd   <- function(S, M, N) {sqrt(weighted.mean(S^2 + M^2, N) -
                                      weighted.mean(M, N)^2)}

I don't know of a specific package or function name but it seems easy to roll your own function from Wikipedia's page. Assuming no overlap in the populations:

## N: vector of sizes
## M: vector of means
## S: vector of standard deviations

grand.mean <- function(M, N) {weighted.mean(M, N)}
grand.sd   <- function(S, M, N) {sqrt(weighted.mean(S^2 + M^2, N) -
                                      weighted.mean(M, N)^2)}

回复收藏 0 原文

鸢与 2025-01-10 20:22:19

人口不重叠吗？

library(fishmethods)
combinevar

例如，维基百科中的示例将像这样工作：

xbar <- c(70,65)
s<-c(3,2)
n <- c(1,1)
combinevar(xbar,s,n)

则标准差将为 sqrt(combinevar(xbar,s,n)[2]) ，该函数如下所示：

如果您不想下载该库，

combinevar <- 
function (xbar = NULL, s_squared = NULL, n = NULL) 
{
    if (length(xbar) != length(s_squared) | length(xbar) != length(n) | 
        length(s_squared) != length(n)) 
        stop("Vector lengths are different.")
    sum_of_squares <- sum((n - 1) * s_squared + n * xbar^2)
    grand_mean <- sum(n * xbar)/sum(n)
    combined_var <- (sum_of_squares - sum(n) * grand_mean^2)/(sum(n) - 
        1)
    return(c(grand_mean, combined_var))
}

Are the populations non overlapping?

library(fishmethods)
combinevar

For instance the example in wikipedia would work like this:

xbar <- c(70,65)
s<-c(3,2)
n <- c(1,1)
combinevar(xbar,s,n)

and standard deviation would be sqrt(combinevar(xbar,s,n)[2])

if you don't want to download the library the function goes like this:

combinevar <- 
function (xbar = NULL, s_squared = NULL, n = NULL) 
{
    if (length(xbar) != length(s_squared) | length(xbar) != length(n) | 
        length(s_squared) != length(n)) 
        stop("Vector lengths are different.")
    sum_of_squares <- sum((n - 1) * s_squared + n * xbar^2)
    grand_mean <- sum(n * xbar)/sum(n)
    combined_var <- (sum_of_squares - sum(n) * grand_mean^2)/(sum(n) - 
        1)
    return(c(grand_mean, combined_var))
}

回复收藏 0 原文

天生の放荡 2025-01-10 20:22:19

使用 `utilities` 包中的

sample.decomp 函数这种统计问题现在已在 实用程序包。该函数可以根据子组矩计算合并样本矩，或者根据其他子组矩和合并矩计算缺失的子组矩。它适用于四阶分解，即样本大小、样本均值、样本方差/标准差、样本偏度和样本峰度的分解。

如何使用该函数：这里我们给出一个示例，其中我们使用该函数来计算由四个子组组成的合并样本的样本矩。为此，我们首先生成一个模拟数据集 DATA，其中包含四个大小不等的子组，并将它们汇集为单个数据集 POOL。子组和合并样本的矩可以使用同一包中的moments函数获得。

#Create some subgroups of mock data and a pooled dataset
set.seed(1)
N    <- c(28, 44, 51, 102)
SUB1 <- rnorm(N[1])
SUB2 <- rnorm(N[2])
SUB3 <- rnorm(N[3])
SUB4 <- rnorm(N[4])
DATA <- list(SUB1 = SUB1, SUB2 = SUB2, SUB3 = SUB3, SUB4 = SUB4)
POOL <- c(SUB1, SUB2, SUB3, SUB4)

#Show sample statistics for the subgroups
library(utilities)
moments(DATA)

       n sample.mean sample.var sample.skew sample.kurt NAs
SUB1  28  0.09049834  0.9013829  -0.7648008    3.174128   0
SUB2  44  0.18637936  0.8246700   0.3653918    3.112901   0
SUB3  51  0.05986594  0.6856030   0.3076281    2.306243   0
SUB4 102 -0.05135660  1.0526184   0.3348429    2.741974   0

#Show sample statistics for the pooled sample
moments(POOL)

       n sample.mean sample.var sample.skew sample.kurt NAs
POOL 225  0.03799749  0.9030244   0.1705622    2.828833   0

现在我们已经有了子组的矩集，我们可以使用 sample.decomp 函数从子组样本矩中获取合并样本矩。作为此函数的输入，您可以使用子组的 moments 输出，也可以分别输入样本大小和样本矩作为向量（这里我们将使用后者）。正如您所看到的，这为合并样本提供了与根据基础数据直接计算相同的样本矩。

#Compute sample statistics for subgroups
library(utilities)
MEAN   <- c(mean(SUB1), mean(SUB2), mean(SUB3), mean(SUB4))
VAR    <- c( var(SUB1),  var(SUB2),  var(SUB3),  var(SUB4))

#Compute sample decomposition
sample.decomp(n = N, sample.mean = MEAN, sample.var  = VAR, names = names(DATA))

             n sample.mean sample.var
SUB1        28  0.09049834  0.9013829
SUB2        44  0.18637936  0.8246700
SUB3        51  0.05986594  0.6856030
SUB4       102 -0.05135660  1.0526184
--pooled-- 225  0.03799749  0.9030244

如您所见，sample.decomp 函数允许计算合并样本方差。您可以在包文档中阅读有关此功能的信息。

Use the `sample.decomp` function in the `utilities` package

Statistical problems of this kind have now been automated in the sample.decomp function in the utilities package. This function can compute pooled sample moments from subgroup moments, or compute missing subgroup moments from the other subgroup moments and pooled moments. It works for decompositions up to fourth order ---i.e., decompositions of sample size, sample mean, sample variance/standard deviation, sample skewness, and sample kurtosis.

How to use the function: Here we give an example where we use the function to compute the sample moments of a pooled sample composed of four subgroups. To do this, we first generate a mock dataset DATA containing four subgroups with unequal sizes, and we pool these as the single dataset POOL. The moments of the subgroups and the pooled sample can be obtained using the moments function in the same package.

#Create some subgroups of mock data and a pooled dataset
set.seed(1)
N    <- c(28, 44, 51, 102)
SUB1 <- rnorm(N[1])
SUB2 <- rnorm(N[2])
SUB3 <- rnorm(N[3])
SUB4 <- rnorm(N[4])
DATA <- list(SUB1 = SUB1, SUB2 = SUB2, SUB3 = SUB3, SUB4 = SUB4)
POOL <- c(SUB1, SUB2, SUB3, SUB4)

#Show sample statistics for the subgroups
library(utilities)
moments(DATA)

       n sample.mean sample.var sample.skew sample.kurt NAs
SUB1  28  0.09049834  0.9013829  -0.7648008    3.174128   0
SUB2  44  0.18637936  0.8246700   0.3653918    3.112901   0
SUB3  51  0.05986594  0.6856030   0.3076281    2.306243   0
SUB4 102 -0.05135660  1.0526184   0.3348429    2.741974   0

#Show sample statistics for the pooled sample
moments(POOL)

       n sample.mean sample.var sample.skew sample.kurt NAs
POOL 225  0.03799749  0.9030244   0.1705622    2.828833   0

Now that we have set of moments for subgroups, we can use the sample.decomp function to obtain the pooled sample moments from the subgroup sample moments. As an input to this function you can either use the moments output for the subgroups or you can input the sample sizes and sample moments separately as vectors (here we will do the latter). As you can see, this gives the same sample moments for the pooled sample as direct computation from the underlying data.

#Compute sample statistics for subgroups
library(utilities)
MEAN   <- c(mean(SUB1), mean(SUB2), mean(SUB3), mean(SUB4))
VAR    <- c( var(SUB1),  var(SUB2),  var(SUB3),  var(SUB4))

#Compute sample decomposition
sample.decomp(n = N, sample.mean = MEAN, sample.var  = VAR, names = names(DATA))

             n sample.mean sample.var
SUB1        28  0.09049834  0.9013829
SUB2        44  0.18637936  0.8246700
SUB3        51  0.05986594  0.6856030
SUB4       102 -0.05135660  1.0526184
--pooled-- 225  0.03799749  0.9030244

As you can see, the sample.decomp function allows computation of the pooled sample variance. You can read about this function in the package documentation.

回复收藏 0 原文

~没有更多了~