使用Sapply同时计算几列的统计数据
我有一个数据框,如下所示:
# A tibble: 6 x 4
Placebo High Medium Low
<dbl> <dbl> <dbl> <dbl>
1 0.0400 -0.04 0.0100 0.0100
2 0.04 0 -0.0100 0.04
3 0.0200 -0.1 -0.05 -0.0200
4 0.03 -0.0200 0.03 -0.00700
5 -0.00500 -0.0100 0.0200 0.0100
6 0.0300 -0.0100 NA NA
您可以使用cohen.d()函数从Effsize软件包中获得两个列的CoHensD:
df <- data.frame(Placebo = c(0.0400, 0.04, 0.0200, 0.03, -0.00500, 0.0300),
Low = c(-0.04, 0, -0.1, -0.0200, -0.0100, -0.0100),
Medium = c(0.0100, -0.0100, -0.05, 0.03, 0.0200, NA ),
High = c(0.0100, 0.04, -0.0200, -0.00700, 0.0100, NA))
library(effsize)
cohen.d(as.vector(na.omit(df$Placebo)), as.vector(na.omit(df$High)))
有趣的是,我在此代码中遇到以下错误:
数据中的错误[,组]:不正确的尺寸数量
但是,我想创建一个函数,使您可以在其中一个列和其余部分之间获得所有cohensd。
为了使所有列的cohensd违反安慰剂,我们会使用类似的东西:
sapply(df, function(i) cohen.d(pull(df, as.vector(na.omit(!!Placebo))), as.vector(na.omit(i))))
但是我不确定这是否有效。
编辑:我不想删除完整的行,因为可以针对不同的长度向量计算Cohens D。理想情况下,我想独立删除每列的NA统计数据
I have a dataframe as follows:
# A tibble: 6 x 4
Placebo High Medium Low
<dbl> <dbl> <dbl> <dbl>
1 0.0400 -0.04 0.0100 0.0100
2 0.04 0 -0.0100 0.04
3 0.0200 -0.1 -0.05 -0.0200
4 0.03 -0.0200 0.03 -0.00700
5 -0.00500 -0.0100 0.0200 0.0100
6 0.0300 -0.0100 NA NA
You could get the cohensD for two of the columns using the cohen.d() function from the effsize package:
df <- data.frame(Placebo = c(0.0400, 0.04, 0.0200, 0.03, -0.00500, 0.0300),
Low = c(-0.04, 0, -0.1, -0.0200, -0.0100, -0.0100),
Medium = c(0.0100, -0.0100, -0.05, 0.03, 0.0200, NA ),
High = c(0.0100, 0.04, -0.0200, -0.00700, 0.0100, NA))
library(effsize)
cohen.d(as.vector(na.omit(df$Placebo)), as.vector(na.omit(df$High)))
Interestingly enough, I'm getting the following error with this code:
Error in data[, group] : incorrect number of dimensions
However, I would like to create a function that allows you to obtain all the cohensd between one of the columns and the rest of them.
In order to get the cohensD of all columns against the Placebo we would use something like:
sapply(df, function(i) cohen.d(pull(df, as.vector(na.omit(!!Placebo))), as.vector(na.omit(i))))
But I'm not sure this would work anyway.
Edit: I don't want to erase the full row, as cohens d can be computed for different length vectors. Ideally, I would like to get the stat with the NA removed for each column independetly
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
可以分别删除每个列上的
na
最好通过创建逻辑索引和“安慰剂”,或者我们想使用
lapply/sapply , 以外的列
安慰剂输出
It may be better to remove the
NA
on each of the columns separately by creating a logical index along with 'Placebo'Or if we want to use
lapply/sapply
, loop over the columns other than Placebo-output