R中阈值的相关计算

发布于 2024-12-07 21:35:49 字数 118 浏览 1 评论 0原文

我想计算 R 中的相关性。但是我有很多缺失值。因此,我想在相关矩阵中只承认根据至少 10 对值计算得出的相关性。 如何进行?

编辑: 请注意,相关矩阵是由具有相同个体(行)的两个大矩阵 X 和 Y 生成的。

I would like to compute correlations in R. However I have a lot of missing values. So, I would like to admit in the correlations matrix only correlations that were calculated from at least 10 pairs of values.
How to proceed?

Edit:
please note that correlation matrix is generated from two big matrices X and Y having same individuals (rows).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

冬天的雪花 2024-12-14 21:35:49

首先,我们生成一些示例数据:

R> x = matrix(rnorm(100), ncol=5)
##Fill in some NA's
R> x[3:15,1] = NA
R> x[2:10,3] = NA

接下来,我们循环遍历 x 矩阵进行比较以检测 NA:

##Create a matrix with where the elements are the
##maximum number of possible comparisons 
m = matrix(nrow(x), ncol=ncol(x),nrow=ncol(x)) 
## This comparison can be made more efficient. 
## We only need to do column i with i+1:ncol(x)

## Each list element
for(i in 1:ncol(x)) {
    detect_na = is.na(x[,i]==x)
    c_sums = colSums(detect_na)
    m[i,] = m[i,] - c_sums
}

矩阵 m 现在包含每个列对的比较次数。现在转换 m 矩阵以准备子集化:

 m = ifelse(m>10, TRUE, NA)

接下来我们根据 m 计算所有列对和子集的相关性:

R> matrix(cor(x, use = "complete.obs")[ m], ncol=ncol(m), nrow=nrow(m))
     [,1]    [,2]     [,3]    [,4]    [,5]
[1,]   NA      NA       NA      NA      NA
[2,]   NA  1.0000 -0.14302 0.35902 -0.3466
[3,]   NA -0.1430  1.00000 0.03949  0.6172
[4,]   NA  0.3590  0.03949 1.00000  0.1606
[5,]   NA -0.3466  0.61720 0.16061  1.0000

First we generate some example data:

R> x = matrix(rnorm(100), ncol=5)
##Fill in some NA's
R> x[3:15,1] = NA
R> x[2:10,3] = NA

Next we loop through the x matrix doing a comparsion to detect NA's:

##Create a matrix with where the elements are the
##maximum number of possible comparisons 
m = matrix(nrow(x), ncol=ncol(x),nrow=ncol(x)) 
## This comparison can be made more efficient. 
## We only need to do column i with i+1:ncol(x)

## Each list element
for(i in 1:ncol(x)) {
    detect_na = is.na(x[,i]==x)
    c_sums = colSums(detect_na)
    m[i,] = m[i,] - c_sums
}

The matrix m now contains the number of comparison for each column pair. Now convert the m matrix in preparation of subsetting:

 m = ifelse(m>10, TRUE, NA)

Next we work out the correlation for all column pairs and subset according to m:

R> matrix(cor(x, use = "complete.obs")[ m], ncol=ncol(m), nrow=nrow(m))
     [,1]    [,2]     [,3]    [,4]    [,5]
[1,]   NA      NA       NA      NA      NA
[2,]   NA  1.0000 -0.14302 0.35902 -0.3466
[3,]   NA -0.1430  1.00000 0.03949  0.6172
[4,]   NA  0.3590  0.03949 1.00000  0.1606
[5,]   NA -0.3466  0.61720 0.16061  1.0000
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文