检查矩阵中的至少2列至少具有3个值...但是它们必须在相同的行中（用于成对测试）

发布于 2025-01-19 06:41:23 字数 806 浏览 4 评论 0原文

假设我有一个如下所示的矩阵：

set.seed(123)
newmat=matrix(rnorm(25),ncol=5)
colnames(newmat)=paste0('mark',1:5)
rownames(newmat)=paste0('id',1:5)
newmat[,2]=NA
newmat[c(2,5),4]=NA
newmat[c(1,4,5),5]=NA
newmat[1,1]=NA
newmat[5,3]=NA

> newmat
          mark1 mark2     mark3      mark4      mark5
id1          NA    NA 1.2240818  1.7869131         NA
id2 -0.23017749    NA 0.3598138         NA -0.2179749
id3  1.55870831    NA 0.4007715 -1.9666172 -1.0260044
id4  0.07050839    NA 0.1106827  0.7013559         NA
id5  0.12928774    NA        NA         NA         NA

我唯一想以简单的方式检查的是，至少有 2 列有 3 个值，而且这些列的值位于同一行中......

在上面的例子中，我有一对列 1 和 3 满足这个要求，以及一对列 3 和 4...一对列 1 和 4 不能满足这个要求。总共 3 列。

我如何在 R 中进行此检查？我知道我会做一些涉及 colSums(!is.na(newmat)) 的事情，但不确定其余的......谢谢！

原文

Say I have a matrix like the following:

set.seed(123)
newmat=matrix(rnorm(25),ncol=5)
colnames(newmat)=paste0('mark',1:5)
rownames(newmat)=paste0('id',1:5)
newmat[,2]=NA
newmat[c(2,5),4]=NA
newmat[c(1,4,5),5]=NA
newmat[1,1]=NA
newmat[5,3]=NA

> newmat
          mark1 mark2     mark3      mark4      mark5
id1          NA    NA 1.2240818  1.7869131         NA
id2 -0.23017749    NA 0.3598138         NA -0.2179749
id3  1.55870831    NA 0.4007715 -1.9666172 -1.0260044
id4  0.07050839    NA 0.1106827  0.7013559         NA
id5  0.12928774    NA        NA         NA         NA

The only thing I want to check here in an easy way, is that there are at least 2 columns with 3 values, but also, that those columns have the values in the same rows...

In the case above, I have the pair of columns 1 and 3 fulfilling this, as well as the pair of columns 3 and 4... the pair of columns 1 and 4 wouldn't fulfill this. For a total of 3 columns.

How could I do this check in R? I know I'd do something involving colSums(!is.na(newmat)) but not sure about the rest... Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

抱猫软卧 2025-01-26 06:41:23

这是一个矩阵（通过使用 crossprod + is.na 获得），显示了哪些对满足您的目标，

> `diag<-`(crossprod(!is.na(newmat)), 0) >= 3
      mark1 mark2 mark3 mark4 mark5
mark1 FALSE FALSE  TRUE FALSE FALSE
mark2 FALSE FALSE FALSE FALSE FALSE
mark3  TRUE FALSE FALSE  TRUE FALSE
mark4 FALSE FALSE  TRUE FALSE FALSE
mark5 FALSE FALSE FALSE FALSE FALSE

如我们所见，对 (mark1, mark3)< /code> 和 (mark3, mark4) 是所需的输出。

Here is a matrix (obtained by using crossprod + is.na) that shows which pairs fullfil your objective

> `diag<-`(crossprod(!is.na(newmat)), 0) >= 3
      mark1 mark2 mark3 mark4 mark5
mark1 FALSE FALSE  TRUE FALSE FALSE
mark2 FALSE FALSE FALSE FALSE FALSE
mark3  TRUE FALSE FALSE  TRUE FALSE
mark4 FALSE FALSE  TRUE FALSE FALSE
mark5 FALSE FALSE FALSE FALSE FALSE

as we can see, pairs (mark1, mark3) and (mark3, mark4) are the desired output.

回复收藏 0 原文

铃予 2025-01-26 06:41:23

这是一种方法。

首先，创建一个包含所有可能的列配对（不包括自配对）的数据框：

pairs <- expand.grid(a = colnames(newmat), b = colnames(newmat))
pairs <- pairs[pairs$a != pairs$b,]

现在，对于该数据框中的每一行，使用 a 列和 b 列中的条目从 newmat 中提取相关列。计算每个列对中非 NA 的条目数，并将其存储为 pairs 中的列。这一切都可以通过 apply 调用来完成：

pairs$matches <- apply(pairs, 1, function(row) {
  sum(!is.na(newmat[,row[1]]) & !is.na(newmat[,row[2]]))
  })

现在过滤掉少于 3 个匹配的 pairs 行：

pairs <- pairs[pairs$matches > 2,]

现在 pairs 看起来像如果我们取消列出前两列

pairs
#>        a     b matches
#> 3  mark3 mark1       3
#> 11 mark1 mark3       3
#> 14 mark4 mark3       3
#> 18 mark3 mark4       3

，找到所有唯一值并对它们进行排序，我们就有了一个我们想要的列名称的向量，因此我们使用它来对矩阵进行子集化以删除冗余列：

newmat[,sort(unique(as.character(unlist(pairs[1:2]))))]
#>           mark1     mark3      mark4
#> id1          NA 1.2240818  1.7869131
#> id2 -0.23017749 0.3598138         NA
#> id3  1.55870831 0.4007715 -1.9666172
#> id4  0.07050839 0.1106827  0.7013559
#> id5  0.12928774        NA         NA

Here's one way to do it.

First, create a data frame of all the possible column pairings, excluding self-pairings:

pairs <- expand.grid(a = colnames(newmat), b = colnames(newmat))
pairs <- pairs[pairs$a != pairs$b,]

Now, for each row in this data frame, use the entries in column a and b to extract the relevant columns from newmat. Count the number of entries that are both non-NA in each column pair, and store it as a column in pairs. This can all be done with an apply call:

pairs$matches <- apply(pairs, 1, function(row) {
  sum(!is.na(newmat[,row[1]]) & !is.na(newmat[,row[2]]))
  })

Now filter out the rows of pairs where there were less than 3 matches:

pairs <- pairs[pairs$matches > 2,]

Now pairs looks like this:

pairs
#>        a     b matches
#> 3  mark3 mark1       3
#> 11 mark1 mark3       3
#> 14 mark4 mark3       3
#> 18 mark3 mark4       3

If we unlist the first two columns, find all the unique values and sort them, we have a vector of the column names we want, so we use this to subset the matrix to remove the redundant columns:

newmat[,sort(unique(as.character(unlist(pairs[1:2]))))]
#>           mark1     mark3      mark4
#> id1          NA 1.2240818  1.7869131
#> id2 -0.23017749 0.3598138         NA
#> id3  1.55870831 0.4007715 -1.9666172
#> id4  0.07050839 0.1106827  0.7013559
#> id5  0.12928774        NA         NA

回复收藏 0 原文

~没有更多了~