矩阵%in%矩阵

发布于 2024-12-12 12:21:34 字数 390 浏览 0 评论 0原文

假设我有两个矩阵,每个矩阵都有两列和不同的行数。我想检查并查看一个矩阵的哪些对位于另一个矩阵中。如果这些是一维的,我通常只需执行 a %in% x 即可获得结果。 match 似乎只适用于向量。

> a
      [,1] [,2]
[1,]    1    2
[2,]    4    9
[3,]    1    6
[4,]    7    7
> x
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

我希望结果为 c(FALSE,TRUE,TRUE,FALSE)

Suppose I have two matrices, each with two columns and differing numbers of row. I want to check and see which pairs of one matrix are in the other matrix. If these were one-dimensional, I would normally just do a %in% x to get my results. match seems only to work on vectors.

> a
      [,1] [,2]
[1,]    1    2
[2,]    4    9
[3,]    1    6
[4,]    7    7
> x
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

I would like the result to be c(FALSE,TRUE,TRUE,FALSE).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

余生共白头 2024-12-19 12:21:34

重新创建数据:

a <- matrix(c(1, 2, 4, 9, 1, 6, 7, 7), ncol=2, byrow=TRUE)
x <- matrix(c(1, 6, 2, 7, 3, 8, 4, 9, 5, 10), ncol=2, byrow=TRUE)

定义一个与 %in% 类似的矩阵函数 %inm%

`%inm%` <- function(x, matrix){
  test <- apply(matrix, 1, `==`, x)
  any(apply(test, 2, all))
}

将其应用于您的数据:

apply(a, 1, `%inm%`, x)
[1] FALSE  TRUE  TRUE FALSE

要比较单行:

a[1, ] %inm% x
[1] FALSE

a[2, ] %inm% x
[1] TRUE

Recreate your data:

a <- matrix(c(1, 2, 4, 9, 1, 6, 7, 7), ncol=2, byrow=TRUE)
x <- matrix(c(1, 6, 2, 7, 3, 8, 4, 9, 5, 10), ncol=2, byrow=TRUE)

Define a function %inm% that is a matrix analogue to %in%:

`%inm%` <- function(x, matrix){
  test <- apply(matrix, 1, `==`, x)
  any(apply(test, 2, all))
}

Apply this to your data:

apply(a, 1, `%inm%`, x)
[1] FALSE  TRUE  TRUE FALSE

To compare a single row:

a[1, ] %inm% x
[1] FALSE

a[2, ] %inm% x
[1] TRUE
能否归途做我良人 2024-12-19 12:21:34

另一种方法是:

> paste(a[,1], a[,2], sep="$") %in% paste(x[,1], x[,2], sep="$")
[1] FALSE  TRUE  TRUE FALSE

更通用的版本是:

> apply(a, 1, paste, collapse="$") %in% apply(x, 1, paste, collapse="$")
[1] FALSE  TRUE  TRUE FALSE

Another approach would be:

> paste(a[,1], a[,2], sep="$") %in% paste(x[,1], x[,2], sep="$")
[1] FALSE  TRUE  TRUE FALSE

A more general version of this is:

> apply(a, 1, paste, collapse="$") %in% apply(x, 1, paste, collapse="$")
[1] FALSE  TRUE  TRUE FALSE
想念有你 2024-12-19 12:21:34

安德里的解决方案非常好。但如果你有大矩阵,你可能想尝试基于递归的其他方法。如果您按列工作,您可以通过排除第一个位置不匹配的所有内容来减少计算时间:

fastercheck <- function(x,matrix){
  nc <- ncol(matrix)
  rec.check <- function(r,i,id){
    id[id] <- matrix[id,i] %in% r[i]
    if(i<nc & any(id)) rec.check(r,i+1,id) else any(id)
  }
  apply(x,1,rec.check,1,rep(TRUE,nrow(matrix)))
}

比较:

> set.seed(100)
> x <- matrix(runif(1e6),ncol=10)
> a <- matrix(runif(300),ncol=10)
> a[c(3,7,9,15),] <- x[c(1000,48213,867,20459),]
> system.time(res1 <- a %inm% x)
   user  system elapsed 
  31.16    0.14   31.50 
> system.time(res2 <- fastercheck(a,x))
   user  system elapsed 
   0.37    0.00    0.38 
> identical(res1, res2)
[1] TRUE
> which(res2)
[1]  3  7  9 15

编辑:

我检查接受的答案只是为了好玩。比 double apply 表现更好(因为你摆脱了内部循环),但递归仍然占主导地位! ;-)

> system.time(apply(a, 1, paste, collapse="$") %in% 
 + apply(x, 1, paste, collapse="$"))
   user  system elapsed 
   6.40    0.01    6.41 

Andrie's solution is perfectly fine. But if you have big matrices, you might want to try something else, based on recursion. If you work columnwise, you can cut down on the calculation time by excluding everything that doesn't match at the first position:

fastercheck <- function(x,matrix){
  nc <- ncol(matrix)
  rec.check <- function(r,i,id){
    id[id] <- matrix[id,i] %in% r[i]
    if(i<nc & any(id)) rec.check(r,i+1,id) else any(id)
  }
  apply(x,1,rec.check,1,rep(TRUE,nrow(matrix)))
}

The comparison :

> set.seed(100)
> x <- matrix(runif(1e6),ncol=10)
> a <- matrix(runif(300),ncol=10)
> a[c(3,7,9,15),] <- x[c(1000,48213,867,20459),]
> system.time(res1 <- a %inm% x)
   user  system elapsed 
  31.16    0.14   31.50 
> system.time(res2 <- fastercheck(a,x))
   user  system elapsed 
   0.37    0.00    0.38 
> identical(res1, res2)
[1] TRUE
> which(res2)
[1]  3  7  9 15

EDIT:

I checked the accepted answer just for fun. Performs better than the double apply ( as you get rid of the inner loop), but recursion still rules! ;-)

> system.time(apply(a, 1, paste, collapse="$") %in% 
 + apply(x, 1, paste, collapse="$"))
   user  system elapsed 
   6.40    0.01    6.41 
海拔太高太耀眼 2024-12-19 12:21:34

这是另一种方法,使用 digest 包并为每行创建校验和,校验和是使用哈希算法生成的(默认为 md5

a <- matrix(c(1, 2, 4, 9, 1, 6, 7, 7), ncol=2, byrow=TRUE)
x <- matrix(c(1, 6, 2, 7, 3, 8, 4, 9, 5, 10), ncol=2, byrow=TRUE)
apply(a, 1, digest) %in% apply(x, 1, digest)

[1] FALSE  TRUE  TRUE FALSE

Here is another approach using the digest package and creating checksums for each row, which are generated using a hashing algorithm (the default being md5)

a <- matrix(c(1, 2, 4, 9, 1, 6, 7, 7), ncol=2, byrow=TRUE)
x <- matrix(c(1, 6, 2, 7, 3, 8, 4, 9, 5, 10), ncol=2, byrow=TRUE)
apply(a, 1, digest) %in% apply(x, 1, digest)

[1] FALSE  TRUE  TRUE FALSE
听不够的曲调 2024-12-19 12:21:34

进入游戏较晚:我之前使用“带分隔符粘贴”方法编写了一个算法,然后找到了此页面。我猜测这里的代码片段之一将是最快的,但是:

andrie<-function(mfoo,nfoo) apply(mfoo, 1, `%inm%`, nfoo)
# using Andrie's %inm% operator exactly as above
carl<-function(mfoo,nfoo) {
 allrows<-unlist(sapply(1:nrow(mfoo),function(j) paste(mfoo[j,],collapse='_'))) 
 allfoo <- unlist(sapply(1:nrow(nfoo),function(j) paste(nfoo[j,],collapse='_')))
 thewalls<-setdiff(allrows,allfoo)
 dowalls<-mfoo[allrows%in%thewalls,]
}

 ramnath <- function (a,x) apply(a, 1, digest) %in% apply(x, 1, digest)

 mfoo<-matrix( sample(1:100,400,rep=TRUE),nr=100)
 nfoo<-mfoo[sample(1:100,60),]

 library(microbenchmark)
 microbenchmark(andrie(mfoo,nfoo),carl(mfoo,nfoo),ramnath(mfoo,nfoo),times=5)

Unit: milliseconds
                expr       min        lq    median        uq            max neval
  andrie(mfoo, nfoo) 25.564196 26.527632 27.964448 29.687344     102.802004     5
    carl(mfoo, nfoo)  1.020310  1.079323  1.096855  1.193926       1.246523     5
 ramnath(mfoo, nfoo)  8.176164  8.429318  8.539644  9.258480       9.458608     5

所以显然构造字符串并执行单个集合操作是最快的!
(PS 我检查过,所有 3 种算法都给出相同的结果)

Coming in late to the game: I had previously written an algorithm using the "paste with delimiter" method, and then found this page. I was guessing that one of the code snippets here would be the fastest, but:

andrie<-function(mfoo,nfoo) apply(mfoo, 1, `%inm%`, nfoo)
# using Andrie's %inm% operator exactly as above
carl<-function(mfoo,nfoo) {
 allrows<-unlist(sapply(1:nrow(mfoo),function(j) paste(mfoo[j,],collapse='_'))) 
 allfoo <- unlist(sapply(1:nrow(nfoo),function(j) paste(nfoo[j,],collapse='_')))
 thewalls<-setdiff(allrows,allfoo)
 dowalls<-mfoo[allrows%in%thewalls,]
}

 ramnath <- function (a,x) apply(a, 1, digest) %in% apply(x, 1, digest)

 mfoo<-matrix( sample(1:100,400,rep=TRUE),nr=100)
 nfoo<-mfoo[sample(1:100,60),]

 library(microbenchmark)
 microbenchmark(andrie(mfoo,nfoo),carl(mfoo,nfoo),ramnath(mfoo,nfoo),times=5)

Unit: milliseconds
                expr       min        lq    median        uq            max neval
  andrie(mfoo, nfoo) 25.564196 26.527632 27.964448 29.687344     102.802004     5
    carl(mfoo, nfoo)  1.020310  1.079323  1.096855  1.193926       1.246523     5
 ramnath(mfoo, nfoo)  8.176164  8.429318  8.539644  9.258480       9.458608     5

So apparently constructing character strings and doing a single set operation is fastest!
(PS I checked and all 3 algorithms give the same result)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文