尽可能高效地查找矩阵中的序列
很少的要求。
在发布答案之前,请!
1)
确保您的函数不会与其他数据相关,请模拟几个类似的矩阵。 (关闭种子)
2)
确保您的功能比我的功能快
3)
确保您的功能与我的功能完全相同,在不同的矩阵上模拟它(关闭种子)
例如
for(i in 1:500){
m <- matrix(sample(c(F,T),30,T),ncol = 3) ; colnames(m) <- paste0("x",1:ncol(m))
res <- c(my_fun(m),your_function(m))
print(res)
if(sum(res)==1) break
}
m
4)
该功能应与任何数量的行和列
==================== ==================================== 该函数在逻辑矩阵的第一列中查找true
,如果找到了一个true,请转到第2列和新行,等等。 如果找到序列返回true
如果不是false
set.seed(15)
m <- matrix(sample(c(F,T),30,T),ncol = 3) ; colnames(m) <- paste0("x",1:ncol(m))
m
x1 x2 x3
[1,] FALSE TRUE TRUE
[2,] FALSE FALSE FALSE
[3,] TRUE TRUE TRUE
[4,] TRUE TRUE TRUE
[5,] FALSE FALSE FALSE
[6,] TRUE TRUE FALSE
[7,] FALSE TRUE FALSE
[8,] FALSE FALSE FALSE
[9,] FALSE FALSE TRUE
[10,] FALSE FALSE TRUE
我的慢示例函数
find_seq <- function(m){
colum <- 1
res <- rep(FALSE,ncol(m))
for(i in 1:nrow(m)){
if(m[i,colum]==TRUE){
res[colum] <- TRUE
print(c(row=i,col=colum))
colum <- colum+1}
if(colum>ncol(m)) break
}
all(res)
}
find_seq(m)
row col
3 1
row col
4 2
row col
9 3
[1] TRUE
'尽可能快?
upd =======================
microbenchmark::microbenchmark(Jean_Claude_Arbaut_fun(m),
+ ThomasIsCoding_fun(m),
+ my_fun(m))
Unit: microseconds
expr min lq mean median uq max neval cld
Jean_Claude_Arbaut_fun(m) 2.850 3.421 4.36179 3.9915 4.5615 27.938 100 a
ThomasIsCoding_fun(m) 14.824 15.965 17.92030 16.5350 17.1050 101.489 100 b
my_fun(m) 23.946 24.517 25.59461 25.0880 25.6580 42.192 100 c
Few requirements.
Before posting your answer please!!
1)
Make sure that your function does not give errors with other data, simulate several similar matrices. (turn off the seed)
2)
Make sure your function is faster than mine
3)
Make sure that your function works exactly the same as mine, simulate it on different matrices (turn off the seed)
for example
for(i in 1:500){
m <- matrix(sample(c(F,T),30,T),ncol = 3) ; colnames(m) <- paste0("x",1:ncol(m))
res <- c(my_fun(m),your_function(m))
print(res)
if(sum(res)==1) break
}
m
4)
the function should work with a matrix with any number of rows and columns
==========================================================
The function looks for a true
in the first column of the logical matrix, if a true is found, go to column 2 and a new row, and so on..
If the sequence is found return true
if not false
set.seed(15)
m <- matrix(sample(c(F,T),30,T),ncol = 3) ; colnames(m) <- paste0("x",1:ncol(m))
m
x1 x2 x3
[1,] FALSE TRUE TRUE
[2,] FALSE FALSE FALSE
[3,] TRUE TRUE TRUE
[4,] TRUE TRUE TRUE
[5,] FALSE FALSE FALSE
[6,] TRUE TRUE FALSE
[7,] FALSE TRUE FALSE
[8,] FALSE FALSE FALSE
[9,] FALSE FALSE TRUE
[10,] FALSE FALSE TRUE
my slow example function
find_seq <- function(m){
colum <- 1
res <- rep(FALSE,ncol(m))
for(i in 1:nrow(m)){
if(m[i,colum]==TRUE){
res[colum] <- TRUE
print(c(row=i,col=colum))
colum <- colum+1}
if(colum>ncol(m)) break
}
all(res)
}
find_seq(m)
row col
3 1
row col
4 2
row col
9 3
[1] TRUE
how to make it as fast as possible?
UPD=========================
microbenchmark::microbenchmark(Jean_Claude_Arbaut_fun(m),
+ ThomasIsCoding_fun(m),
+ my_fun(m))
Unit: microseconds
expr min lq mean median uq max neval cld
Jean_Claude_Arbaut_fun(m) 2.850 3.421 4.36179 3.9915 4.5615 27.938 100 a
ThomasIsCoding_fun(m) 14.824 15.965 17.92030 16.5350 17.1050 101.489 100 b
my_fun(m) 23.946 24.517 25.59461 25.0880 25.6580 42.192 100 c
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
更新,
如果您要追求速度,则可以尝试以下基本R解决方案
,并且可以看到
以前的答案
,可以尝试下面的代码
,该
代码更有趣的实现可能是使用 recursions < < /em>(只是为了娱乐,但由于效率低下而不建议)
,这给出了
Update
If you are pursuing the speed, you can try the following base R solution
and you will see
Previous Answer
You can try the code below
which gives
A more interesting implementation might be using the recursions (just for fun, but not recommanded due to the inefficiency)
and which gives
如果我正确理解问题,则一行循环就足够了。这是一种使用RCPP进行此操作的方法。在这里,我只会返回真/错误答案,如果您需要这些索引,它也是可行的。
为了获得索引,以下函数返回一个列表,至少有一个元素(命名为“找到”,true或false),如果发现= true = true,另一个元素,命名为'indices':
一些链接可以学习rcpp:
您必须至少知道一些C语言(最好是C ++,但是对于基本用法,您可以将RCPP视为C,其中一些魔术语法用于R数据类型)。第一个链接说明了RCPP类型的基础知识(向量,矩阵和列表,如何分配,使用和返回它们)。其他链接是很好的补充。
If I understand the problem correctly, a single loop through the rows is enough. Here is a way to do this with Rcpp. Here I only return the true/false answer, if you need the indices, it's also doable.
In order to get also the indices, the following function returns a list, with at least one element (named 'found', true or false) and if found = true, another element, named 'indices':
A few links to learn about Rcpp:
You have to know at least a bit of C language (preferably C++, but for a basic usage, you can think of Rcpp as C with some magic syntax for R data types). The first link explains the basics of Rcpp types (vectors, matrices and lists, how to allocate, use and return them). The other links are good complements.
如果您的示例是代表性的,我们假设
nrow(m)&gt;&gt; NCOL(M)
。在这种情况下,将相互关系从行转移到列将更有效:If your example is representative, we assume that
nrow(m) >> ncol(m)
. In that case, it would be more efficient to move the interation from rows to columns:有点丑陋(
&lt;&lt; -
的原因),但它将完成工作。A bit ugly (cause of the
<<-
), but it will get the job done..这里的功能侧重于案例处理。它比所有人都要快,希望它是正确的:)
数据:
Here a function that focuses on case handling. It's faster than all, hope it's right :)
Data:
使用
累积
:With
accumulate
: