从 for 循环到应用

发布于 2024-10-27 11:00:29 字数 681 浏览 3 评论 0原文

我是 R 的新手。 所以我不知道如何使用apply。 我想使用 apply 来加速我的函数:

for(i in 1: ncol(exp)){
 for (j in 1: length(fe)){
  tmp =TRUE
  id = strsplit(colnames(exp)[i],"\\.")
  if(id == fe[j]){
   tmp = FALSE
  }
  if(tmp ==TRUE){
   only = cbind(only,c(names(exp)[i],exp[,i]) )
  }
 }
}

如何使用 apply 函数来执行上述操作?

编辑:

非常感谢您的很好的解释,并对我的错误描述表示歉意。你猜的一切都是对的,但是当想要删除 fe 中的匹配项时。

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)

fe<-LETTERS[1:2]

那么结果应该只是带有“C”的同名。其他所有内容都应该删除。

1   C.z 
2    11 
3    12   
4    13   
5    14 
6    15  
7    16  
8    17  
9    18   
10   19  
11   20   

I am new in using R.
So I am not sure about how to use apply.
I would like to speed up my function with using apply:

for(i in 1: ncol(exp)){
 for (j in 1: length(fe)){
  tmp =TRUE
  id = strsplit(colnames(exp)[i],"\\.")
  if(id == fe[j]){
   tmp = FALSE
  }
  if(tmp ==TRUE){
   only = cbind(only,c(names(exp)[i],exp[,i]) )
  }
 }
}

How can I use the apply function to do this above?

EDIT :

Thank you so much for the very good explanation and sorry for my bad description. You guess everything right, but When wanted to delete matches in fe.

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)

fe<-LETTERS[1:2]

then the result should be only colnames with 'C'. Everything else should be deleted.

1   C.z 
2    11 
3    12   
4    13   
5    14 
6    15  
7    16  
8    17  
9    18   
10   19  
11   20   

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

短暂陪伴 2024-11-03 11:00:29

编辑:如果您只想删除名称出现在 fe 中的列,您可以简单地执行以下操作:

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe<-LETTERS[1:2]

id <- sapply(strsplit(names(Exp),"\\."),
    function(i)!i[1] %in% fe)
Exp[id]

此代码也完全按照您的(更新的)for 循环执行操作,只是效率更高。您不必循环遍历 fe,%in% 函数已向量化。

如果名称可以出现在点之间的任何位置,那么

id <- sapply(strsplit(names(Exp),"\\."),
    function(i)sum(i %in% fe)==0)

您的代码会做一些非常有趣的事情,而我不知道您到底想做什么。首先,strsplit给出一个列表,因此id == fe[j]将始终返回false,除非fe[j]是一个列表本身。我怀疑它是......所以我会更正你的代码,以防

id = strsplit(colnames(Exp)[i],"\\.")[[1]][1]

你想与点之前的所有内容进行比较,或者

id = unlist(strsplit(colnames(Exp)[i],"\\.")) 

如果你想与字符串中的所有内容进行比较。在这种情况下,您也应该使用 %in% 而不是 ==

其次,您得到的是一个字符矩阵,它本质上是将行相乘。如果 fe[j] 中的所有元素都是唯一的,您也可以这样做:

only <- rbind(names(exp),exp)
only <- do.call(cbind,lapply(mat,function(x) 
       matrix(rep(x,ncol(exp)-1),nrow=nrow(exp)+1)
))

假设代码中的逻辑确实有意义(因为您没有应用一些示例数据,这是不可能知道的),优化运行:

mat <- rbind(names(Exp),Exp)

do.call(cbind,
    lapply(mat, function(x){
        n <- sum(!fe %in% strsplit(x[1],"\\.")[[1]][1])
        matrix(rep(x,n),nrow=nrow(mat))
}))

请注意- 如果您对 fe[j] 出现在名称中的任何位置感兴趣 - 您可以将代码更改为:

do.call(cbind,
    lapply(mat, function(x){
        n <- sum(!fe %in% unlist(strsplit(x[1],"\\.")))
        matrix(rep(x,n),nrow=nrow(mat))
}))

如果这没有返回您想要的内容,那么您的代码也不会执行此操作。我检查了以下示例数据,所有结果都给出了相同的结果:

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe <- LETTERS[1:4]

EDIT : If you only want to delete the columns whose name appear in fe, you can simply do :

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe<-LETTERS[1:2]

id <- sapply(strsplit(names(Exp),"\\."),
    function(i)!i[1] %in% fe)
Exp[id]

This code does exactly what your (updated) for-loop does as well, only a lot more efficient. You don't have to loop through fe, the %in% function is vectorized.

In case the name can appear anywhere between the dots, then

id <- sapply(strsplit(names(Exp),"\\."),
    function(i)sum(i %in% fe)==0)

Your code does some very funny things, and I have no clue what exactly you're trying to do. For one, strsplit gives a list, so id == fe[j] will always return false, unless fe[j] is a list itself. And I doubt it is... So I'd correct your code as

id = strsplit(colnames(Exp)[i],"\\.")[[1]][1]

in case you want to compare with everything that is before the dot, or to

id = unlist(strsplit(colnames(Exp)[i],"\\.")) 

if you want to compare with everything in the string. In that case, you should use %in%instead of == as well.

Second, what you get is a character matrix, which essentially multiplies rows. if all elements in fe[j] are unique, you could as well do :

only <- rbind(names(exp),exp)
only <- do.call(cbind,lapply(mat,function(x) 
       matrix(rep(x,ncol(exp)-1),nrow=nrow(exp)+1)
))

Assuming that the logic in your code does make sense (as you didn't apply some sample data this is impossible to know), the optimalization runs :

mat <- rbind(names(Exp),Exp)

do.call(cbind,
    lapply(mat, function(x){
        n <- sum(!fe %in% strsplit(x[1],"\\.")[[1]][1])
        matrix(rep(x,n),nrow=nrow(mat))
}))

Note that - in case you are interested if fe[j] appears anywhere in the name - you can change the code to :

do.call(cbind,
    lapply(mat, function(x){
        n <- sum(!fe %in% unlist(strsplit(x[1],"\\.")))
        matrix(rep(x,n),nrow=nrow(mat))
}))

If this doesn't return what you want, then your code doesn't do that either. I checked with following sample data, and all gives the same result :

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe <- LETTERS[1:4]
小嗷兮 2024-11-03 11:00:29

apply() 系列函数是便利函数。它们不一定比编写良好的 for 循环或向量化函数更快。例如:

set.seed(21)
x <- matrix(rnorm(1e6),5e5,2)

system.time({
  yLoop <- x[,1]*0  # preallocate result
  for(i in 1:NROW(yLoop)) yLoop[i] <- mean(x[i,])
})
#    user  system elapsed 
#   13.39    0.00   13.39 
system.time(yApply <- apply(x, 1, mean))
#    user  system elapsed 
#   16.19    0.28   16.51
system.time(yRowMean <- rowMeans(x))
#    user  system elapsed 
#    0.02    0.00    0.02
identical(yLoop,yApply,yRowMean)
# TRUE

您的代码如此缓慢的原因是 - 正如 Gavin 指出的 - 您正在为每次循环迭代增加数组。在循环之前预分配整个数组,您将看到显着的加速。

The apply() family of functions are convenience functions. They will not necessarily be faster than a well-written for loop or vectorized functions. For example:

set.seed(21)
x <- matrix(rnorm(1e6),5e5,2)

system.time({
  yLoop <- x[,1]*0  # preallocate result
  for(i in 1:NROW(yLoop)) yLoop[i] <- mean(x[i,])
})
#    user  system elapsed 
#   13.39    0.00   13.39 
system.time(yApply <- apply(x, 1, mean))
#    user  system elapsed 
#   16.19    0.28   16.51
system.time(yRowMean <- rowMeans(x))
#    user  system elapsed 
#    0.02    0.00    0.02
identical(yLoop,yApply,yRowMean)
# TRUE

The reason your code is so slow is that--as Gavin pointed out--you're growing your array for every loop iteration. Preallocate the entire array before the loop and you will see a significant speedup.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文