尽可能高效地查找矩阵中的序列

发布于 2025-01-19 13:57:57 字数 2198 浏览 0 评论 0原文

很少的要求。

在发布答案之前,请!

1)确保您的函数不会与其他数据相关,请模拟几个类似的矩阵。 (关闭种子)

2)确保您的功能比我的功能快

3)确保您的功能与我的功能完全相同,在不同的矩阵上模拟它(关闭种子)

例如

 for(i in 1:500){
    m <- matrix(sample(c(F,T),30,T),ncol = 3) ; colnames(m) <- paste0("x",1:ncol(m))
    
    res <- c(my_fun(m),your_function(m))
    print(res)
    if(sum(res)==1)  break
    }
    m

4)该功能应与任何数量的行和列

==================== ==================================== 该函数在逻辑矩阵的第一列中查找true,如果找到了一个true,请转到第2列和新行,等等。 如果找到序列返回true如果不是false

set.seed(15)
m <- matrix(sample(c(F,T),30,T),ncol = 3) ; colnames(m) <- paste0("x",1:ncol(m))
m
         x1    x2    x3
 [1,] FALSE  TRUE  TRUE
 [2,] FALSE FALSE FALSE
 [3,]  TRUE  TRUE  TRUE
 [4,]  TRUE  TRUE  TRUE
 [5,] FALSE FALSE FALSE
 [6,]  TRUE  TRUE FALSE
 [7,] FALSE  TRUE FALSE
 [8,] FALSE FALSE FALSE
 [9,] FALSE FALSE  TRUE
[10,] FALSE FALSE  TRUE

我的慢示例函数

find_seq <- function(m){
colum <- 1
res <- rep(FALSE,ncol(m))
for(i in 1:nrow(m)){
    if(m[i,colum]==TRUE){
      res[colum] <- TRUE
      print(c(row=i,col=colum))
      colum <- colum+1}
  if(colum>ncol(m)) break
}

 all(res)
}

find_seq(m)
row col 
  3   1 
row col 
  4   2 
row col 
  9   3 
[1] TRUE

'尽可能快?

upd =======================

 microbenchmark::microbenchmark(Jean_Claude_Arbaut_fun(m),
+                                ThomasIsCoding_fun(m),
+                                my_fun(m))
Unit: microseconds
                      expr    min     lq     mean  median      uq     max neval cld
 Jean_Claude_Arbaut_fun(m)  2.850  3.421  4.36179  3.9915  4.5615  27.938   100 a  
     ThomasIsCoding_fun(m) 14.824 15.965 17.92030 16.5350 17.1050 101.489   100  b 
                 my_fun(m) 23.946 24.517 25.59461 25.0880 25.6580  42.192   100   c

Few requirements.

Before posting your answer please!!

1) Make sure that your function does not give errors with other data, simulate several similar matrices. (turn off the seed)

2) Make sure your function is faster than mine

3) Make sure that your function works exactly the same as mine, simulate it on different matrices (turn off the seed)

for example

 for(i in 1:500){
    m <- matrix(sample(c(F,T),30,T),ncol = 3) ; colnames(m) <- paste0("x",1:ncol(m))
    
    res <- c(my_fun(m),your_function(m))
    print(res)
    if(sum(res)==1)  break
    }
    m

4) the function should work with a matrix with any number of rows and columns

==========================================================
The function looks for a true in the first column of the logical matrix, if a true is found, go to column 2 and a new row, and so on..
If the sequence is found return true if not false

set.seed(15)
m <- matrix(sample(c(F,T),30,T),ncol = 3) ; colnames(m) <- paste0("x",1:ncol(m))
m
         x1    x2    x3
 [1,] FALSE  TRUE  TRUE
 [2,] FALSE FALSE FALSE
 [3,]  TRUE  TRUE  TRUE
 [4,]  TRUE  TRUE  TRUE
 [5,] FALSE FALSE FALSE
 [6,]  TRUE  TRUE FALSE
 [7,] FALSE  TRUE FALSE
 [8,] FALSE FALSE FALSE
 [9,] FALSE FALSE  TRUE
[10,] FALSE FALSE  TRUE

my slow example function

find_seq <- function(m){
colum <- 1
res <- rep(FALSE,ncol(m))
for(i in 1:nrow(m)){
    if(m[i,colum]==TRUE){
      res[colum] <- TRUE
      print(c(row=i,col=colum))
      colum <- colum+1}
  if(colum>ncol(m)) break
}

 all(res)
}

enter image description here

find_seq(m)
row col 
  3   1 
row col 
  4   2 
row col 
  9   3 
[1] TRUE

how to make it as fast as possible?

UPD=========================

 microbenchmark::microbenchmark(Jean_Claude_Arbaut_fun(m),
+                                ThomasIsCoding_fun(m),
+                                my_fun(m))
Unit: microseconds
                      expr    min     lq     mean  median      uq     max neval cld
 Jean_Claude_Arbaut_fun(m)  2.850  3.421  4.36179  3.9915  4.5615  27.938   100 a  
     ThomasIsCoding_fun(m) 14.824 15.965 17.92030 16.5350 17.1050 101.489   100  b 
                 my_fun(m) 23.946 24.517 25.59461 25.0880 25.6580  42.192   100   c

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

护你周全 2025-01-26 13:57:57

更新,

如果您要追求速度,则可以尝试以下基本R解决方案

TIC_fun <- function(m) {
    p <- k <- 1
    nr <- nrow(m)
    nc <- ncol(m)
    repeat {
        if (p > nr) {
            return(FALSE)
        }
        found <- FALSE
        for (i in p:nr) {
            if (m[i, k]) {
                # print(c(row = i, col = k))
                p <- i + 1
                k <- k + 1
                found <- TRUE
                break
            }
        }
        if (!found) {
            return(FALSE)
        }
        if (k > nc) {
            return(TRUE)
        }
    }
}

,并且可以看到

Unit: microseconds
       expr    min      lq      mean  median      uq       max neval
  my_fun(m) 18.600 26.3010  41.46795 41.5510 44.3010   121.302   100
 TIC_fun(m) 10.201 14.1515 409.89394 22.6505 24.4005 38906.601   100

以前的答案

,可以尝试下面的代码

lst <- with(as.data.frame(which(m, arr.ind = TRUE)), split(row, col))
# lst <- apply(m, 2, which)

setNames(
    stack(
        setNames(
            Reduce(function(x, y) y[y > x][1],
                lst,
                init = -Inf,
                accumulate = TRUE
            )[-1],
            names(lst)
        )
    ),
    c("row", "col")
)

,该

  row col
1   3   1
2   4   2
3   9   3

代码更有趣的实现可能是使用 recursions < < /em>(只是为了娱乐,但由于效率低下而不建议)

f <- function(k) {
    if (k == 1) {
        return(data.frame(row = which(m[, k])[1], col = k))
    }
    s <- f(k - 1)
    for (i in (tail(s, 1)$row + 1):nrow(m)) {
        if (m[i, k]) {
            return(rbind(s, data.frame(row = i, col = k)))
        }
    }
}

,这给出了

> f(ncol(m))
  row col
1   3   1
2   4   2
3   9   3

Update

If you are pursuing the speed, you can try the following base R solution

TIC_fun <- function(m) {
    p <- k <- 1
    nr <- nrow(m)
    nc <- ncol(m)
    repeat {
        if (p > nr) {
            return(FALSE)
        }
        found <- FALSE
        for (i in p:nr) {
            if (m[i, k]) {
                # print(c(row = i, col = k))
                p <- i + 1
                k <- k + 1
                found <- TRUE
                break
            }
        }
        if (!found) {
            return(FALSE)
        }
        if (k > nc) {
            return(TRUE)
        }
    }
}

and you will see

Unit: microseconds
       expr    min      lq      mean  median      uq       max neval
  my_fun(m) 18.600 26.3010  41.46795 41.5510 44.3010   121.302   100
 TIC_fun(m) 10.201 14.1515 409.89394 22.6505 24.4005 38906.601   100

Previous Answer

You can try the code below

lst <- with(as.data.frame(which(m, arr.ind = TRUE)), split(row, col))
# lst <- apply(m, 2, which)

setNames(
    stack(
        setNames(
            Reduce(function(x, y) y[y > x][1],
                lst,
                init = -Inf,
                accumulate = TRUE
            )[-1],
            names(lst)
        )
    ),
    c("row", "col")
)

which gives

  row col
1   3   1
2   4   2
3   9   3

A more interesting implementation might be using the recursions (just for fun, but not recommanded due to the inefficiency)

f <- function(k) {
    if (k == 1) {
        return(data.frame(row = which(m[, k])[1], col = k))
    }
    s <- f(k - 1)
    for (i in (tail(s, 1)$row + 1):nrow(m)) {
        if (m[i, k]) {
            return(rbind(s, data.frame(row = i, col = k)))
        }
    }
}

and which gives

> f(ncol(m))
  row col
1   3   1
2   4   2
3   9   3
聽兲甴掵 2025-01-26 13:57:57

如果我正确理解问题,则一行循环就足够了。这是一种使用RCPP进行此操作的方法。在这里,我只会返回真/错误答案,如果您需要这些索引,它也是可行的。

library(Rcpp)

cppFunction('
bool hasSequence(LogicalMatrix m) {
  int nrow = m.nrow(), ncol = m.ncol();
  
  if (nrow > 0 && ncol > 0) {
    int j = 0;
    for (int i = 0; i < nrow; i++) {
      if (m(i, j)) {
        if (++j >= ncol) {
          return true;
        }
      }
    }
  }
  return false;
}')


a <- matrix(c(F, F, T, T, F, T, F, F, F, F,
              T, F, T, T, F, T, T, F, F, F,
              T, F, T, T, F, F, F, F, T, T), ncol = 3)

a
hasSequence(a)

为了获得索引,以下函数返回一个列表,至少有一个元素(命名为“找到”,true或false),如果发现= true = true,另一个元素,命名为'indices':

cppFunction('
List findSequence(LogicalMatrix m) {
  int nrow = m.nrow(), ncol = m.ncol();

  IntegerVector indices(ncol);
  if (nrow > 0 && ncol > 0) {
    int j = 0;
    for (int i = 0; i < nrow; i++) {
      if (m(i, j)) {
        indices(j) = i + 1;
        if (++j >= ncol) {
          return List::create(Named("found") = true,
                              Named("indices") = indices);
        }
      }
    }
  }
  return List::create(Named("found") = false);
}')

findSequence(a)

一些链接可以学习rcpp:

您必须至少知道一些C语言(最好是C ++,但是对于基本用法,您可以将RCPP视为C,其中一些魔术语法用于R数据类型)。第一个链接说明了RCPP类型的基础知识(向量,矩阵和列表,如何分配,使用和返回它们)。其他链接是很好的补充。

If I understand the problem correctly, a single loop through the rows is enough. Here is a way to do this with Rcpp. Here I only return the true/false answer, if you need the indices, it's also doable.

library(Rcpp)

cppFunction('
bool hasSequence(LogicalMatrix m) {
  int nrow = m.nrow(), ncol = m.ncol();
  
  if (nrow > 0 && ncol > 0) {
    int j = 0;
    for (int i = 0; i < nrow; i++) {
      if (m(i, j)) {
        if (++j >= ncol) {
          return true;
        }
      }
    }
  }
  return false;
}')


a <- matrix(c(F, F, T, T, F, T, F, F, F, F,
              T, F, T, T, F, T, T, F, F, F,
              T, F, T, T, F, F, F, F, T, T), ncol = 3)

a
hasSequence(a)

In order to get also the indices, the following function returns a list, with at least one element (named 'found', true or false) and if found = true, another element, named 'indices':

cppFunction('
List findSequence(LogicalMatrix m) {
  int nrow = m.nrow(), ncol = m.ncol();

  IntegerVector indices(ncol);
  if (nrow > 0 && ncol > 0) {
    int j = 0;
    for (int i = 0; i < nrow; i++) {
      if (m(i, j)) {
        indices(j) = i + 1;
        if (++j >= ncol) {
          return List::create(Named("found") = true,
                              Named("indices") = indices);
        }
      }
    }
  }
  return List::create(Named("found") = false);
}')

findSequence(a)

A few links to learn about Rcpp:

You have to know at least a bit of C language (preferably C++, but for a basic usage, you can think of Rcpp as C with some magic syntax for R data types). The first link explains the basics of Rcpp types (vectors, matrices and lists, how to allocate, use and return them). The other links are good complements.

帝王念 2025-01-26 13:57:57

如果您的示例是代表性的,我们假设nrow(m)&gt;&gt; NCOL(M)。在这种情况下,将相互关系从行转移到列将更有效:

ff = function(m)
{
  i1 = 1
  for(j in 1:ncol(m)) {
    if(i1 > nrow(m)) return(FALSE)
    i1 = match(TRUE, m[i1:nrow(m), j]) + i1
    #print(i1)
    if(is.na(i1)) return(FALSE) 
  }
  return(TRUE)
}

If your example is representative, we assume that nrow(m) >> ncol(m). In that case, it would be more efficient to move the interation from rows to columns:

ff = function(m)
{
  i1 = 1
  for(j in 1:ncol(m)) {
    if(i1 > nrow(m)) return(FALSE)
    i1 = match(TRUE, m[i1:nrow(m), j]) + i1
    #print(i1)
    if(is.na(i1)) return(FALSE) 
  }
  return(TRUE)
}
爱殇璃 2025-01-26 13:57:57

有点丑陋(&lt;&lt; -的原因),但它将完成工作。

tempval <- 0
lapply(split(m, col(m)), function(x) {
  value <- which(x)[which(x) > tempval][1]
  tempval <<- value
  return(value)
})

# 

有点丑陋(&lt;&lt; - 的原因),但它将完成工作。

1` # [1] 3 # #

有点丑陋(&lt;&lt; - 的原因),但它将完成工作。

2` # [1] 4 # #

有点丑陋(&lt;&lt; - 的原因),但它将完成工作。

3` # [1] 9

A bit ugly (cause of the <<-), but it will get the job done..

tempval <- 0
lapply(split(m, col(m)), function(x) {
  value <- which(x)[which(x) > tempval][1]
  tempval <<- value
  return(value)
})

# 

A bit ugly (cause of the <<-), but it will get the job done..

1` # [1] 3 # #

A bit ugly (cause of the <<-), but it will get the job done..

2` # [1] 4 # #

A bit ugly (cause of the <<-), but it will get the job done..

3` # [1] 9
凡尘雨 2025-01-26 13:57:57

这里的功能侧重于案例处理。它比所有人都要快,希望它是正确的:)

f <- \(m) {
  stopifnot(dim(m)[2] == 3L)
  e <- nrow(m)
  x1 <- if (any(xx1 <- m[, 1])) {
    which.max(xx1)
  } else {
    NA_integer_
  }
  x2 <- if (is.na(x1)) {
    NA_integer_
  }
  else if (any(xx2 <- m[(x1 + 1):e, 2])) {
    which.max(xx2) + x1
  } else {
    NA_integer_
  }
  x3 <- if (is.na(x2)) {
    NA_integer_
  }
  else if (any(xx3 <- m[(x2 + 1):e, 3])) {
    which.max(xx3) + x2
  } else {
    NA_integer_
  }
  !anyNA(c(x1, x2, x3))
}

f(m)
# [1] TRUE

m2 <- m
m2[, 3] <- FALSE

f(m2)
# [1] FALSE

数据:

set.seed(15)
m <- matrix(sample(c(FALSE, TRUE), 30, TRUE), ncol=3)

Here a function that focuses on case handling. It's faster than all, hope it's right :)

f <- \(m) {
  stopifnot(dim(m)[2] == 3L)
  e <- nrow(m)
  x1 <- if (any(xx1 <- m[, 1])) {
    which.max(xx1)
  } else {
    NA_integer_
  }
  x2 <- if (is.na(x1)) {
    NA_integer_
  }
  else if (any(xx2 <- m[(x1 + 1):e, 2])) {
    which.max(xx2) + x1
  } else {
    NA_integer_
  }
  x3 <- if (is.na(x2)) {
    NA_integer_
  }
  else if (any(xx3 <- m[(x2 + 1):e, 3])) {
    which.max(xx3) + x2
  } else {
    NA_integer_
  }
  !anyNA(c(x1, x2, x3))
}

f(m)
# [1] TRUE

m2 <- m
m2[, 3] <- FALSE

f(m2)
# [1] FALSE

Data:

set.seed(15)
m <- matrix(sample(c(FALSE, TRUE), 30, TRUE), ncol=3)
一城柳絮吹成雪 2025-01-26 13:57:57

使用累积

purrr::accumulate(apply(m, 2, which), .init = -Inf, ~ min(.y[.y > min(.x)]))[-1]

# or

purrr::accumulate(apply(m, 2, which), .init = -Inf, ~ .y[.y > .x][1])[-1]

# x1 x2 x3 
#  3  4  9 

With accumulate:

purrr::accumulate(apply(m, 2, which), .init = -Inf, ~ min(.y[.y > min(.x)]))[-1]

# or

purrr::accumulate(apply(m, 2, which), .init = -Inf, ~ .y[.y > .x][1])[-1]

# x1 x2 x3 
#  3  4  9 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文