检查矩阵中的至少2列至少具有3个值...但是它们必须在相同的行中(用于成对测试)
假设我有一个如下所示的矩阵:
set.seed(123)
newmat=matrix(rnorm(25),ncol=5)
colnames(newmat)=paste0('mark',1:5)
rownames(newmat)=paste0('id',1:5)
newmat[,2]=NA
newmat[c(2,5),4]=NA
newmat[c(1,4,5),5]=NA
newmat[1,1]=NA
newmat[5,3]=NA
> newmat
mark1 mark2 mark3 mark4 mark5
id1 NA NA 1.2240818 1.7869131 NA
id2 -0.23017749 NA 0.3598138 NA -0.2179749
id3 1.55870831 NA 0.4007715 -1.9666172 -1.0260044
id4 0.07050839 NA 0.1106827 0.7013559 NA
id5 0.12928774 NA NA NA NA
我唯一想以简单的方式检查的是,至少有 2 列有 3 个值,而且这些列的值位于同一行中......
在上面的例子中,我有一对列 1 和 3 满足这个要求,以及一对列 3 和 4...一对列 1 和 4 不能满足这个要求。总共 3 列。
我如何在 R 中进行此检查?我知道我会做一些涉及 colSums(!is.na(newmat))
的事情,但不确定其余的......谢谢!
Say I have a matrix like the following:
set.seed(123)
newmat=matrix(rnorm(25),ncol=5)
colnames(newmat)=paste0('mark',1:5)
rownames(newmat)=paste0('id',1:5)
newmat[,2]=NA
newmat[c(2,5),4]=NA
newmat[c(1,4,5),5]=NA
newmat[1,1]=NA
newmat[5,3]=NA
> newmat
mark1 mark2 mark3 mark4 mark5
id1 NA NA 1.2240818 1.7869131 NA
id2 -0.23017749 NA 0.3598138 NA -0.2179749
id3 1.55870831 NA 0.4007715 -1.9666172 -1.0260044
id4 0.07050839 NA 0.1106827 0.7013559 NA
id5 0.12928774 NA NA NA NA
The only thing I want to check here in an easy way, is that there are at least 2 columns with 3 values, but also, that those columns have the values in the same rows...
In the case above, I have the pair of columns 1 and 3 fulfilling this, as well as the pair of columns 3 and 4... the pair of columns 1 and 4 wouldn't fulfill this. For a total of 3 columns.
How could I do this check in R? I know I'd do something involving colSums(!is.na(newmat))
but not sure about the rest... Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一个矩阵(通过使用
crossprod
+is.na
获得),显示了哪些对满足您的目标,如我们所见,对
(mark1, mark3)< /code> 和
(mark3, mark4)
是所需的输出。Here is a matrix (obtained by using
crossprod
+is.na
) that shows which pairs fullfil your objectiveas we can see, pairs
(mark1, mark3)
and(mark3, mark4)
are the desired output.这是一种方法。
首先,创建一个包含所有可能的列配对(不包括自配对)的数据框:
现在,对于该数据框中的每一行,使用 a 列和 b 列中的条目从
newmat
中提取相关列。计算每个列对中非NA
的条目数,并将其存储为pairs
中的列。这一切都可以通过apply
调用来完成:现在过滤掉少于 3 个匹配的
pairs
行:现在
pairs
看起来像 如果我们取消列出前两列,找到所有唯一值并对它们进行排序,我们就有了一个我们想要的列名称的向量,因此我们使用它来对矩阵进行子集化以删除冗余列:
Here's one way to do it.
First, create a data frame of all the possible column pairings, excluding self-pairings:
Now, for each row in this data frame, use the entries in column a and b to extract the relevant columns from
newmat
. Count the number of entries that are both non-NA
in each column pair, and store it as a column inpairs
. This can all be done with anapply
call:Now filter out the rows of
pairs
where there were less than 3 matches:Now
pairs
looks like this:If we unlist the first two columns, find all the unique values and sort them, we have a vector of the column names we want, so we use this to subset the matrix to remove the redundant columns: