从 for 循环到应用

发布于 2024-10-27 11:00:29 字数 681 浏览 3 评论 0原文

我是 R 的新手。所以我不知道如何使用apply。我想使用 apply 来加速我的函数：

for(i in 1: ncol(exp)){
 for (j in 1: length(fe)){
  tmp =TRUE
  id = strsplit(colnames(exp)[i],"\\.")
  if(id == fe[j]){
   tmp = FALSE
  }
  if(tmp ==TRUE){
   only = cbind(only,c(names(exp)[i],exp[,i]) )
  }
 }
}

如何使用 apply 函数来执行上述操作？

编辑：

非常感谢您的很好的解释，并对我的错误描述表示歉意。你猜的一切都是对的，但是当想要删除 fe 中的匹配项时。

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)

fe<-LETTERS[1:2]

那么结果应该只是带有“C”的同名。其他所有内容都应该删除。

原文

I am new in using R.
So I am not sure about how to use apply.
I would like to speed up my function with using apply:

for(i in 1: ncol(exp)){
 for (j in 1: length(fe)){
  tmp =TRUE
  id = strsplit(colnames(exp)[i],"\\.")
  if(id == fe[j]){
   tmp = FALSE
  }
  if(tmp ==TRUE){
   only = cbind(only,c(names(exp)[i],exp[,i]) )
  }
 }
}

How can I use the apply function to do this above?

EDIT :

Thank you so much for the very good explanation and sorry for my bad description. You guess everything right, but When wanted to delete matches in fe.

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)

fe<-LETTERS[1:2]

then the result should be only colnames with 'C'. Everything else should be deleted.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

短暂陪伴 2024-11-03 11:00:29

编辑：如果您只想删除名称出现在 fe 中的列，您可以简单地执行以下操作：

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe<-LETTERS[1:2]

id <- sapply(strsplit(names(Exp),"\\."),
    function(i)!i[1] %in% fe)
Exp[id]

此代码也完全按照您的（更新的）for 循环执行操作，只是效率更高。您不必循环遍历 fe，%in% 函数已向量化。

如果名称可以出现在点之间的任何位置，那么

id <- sapply(strsplit(names(Exp),"\\."),
    function(i)sum(i %in% fe)==0)

您的代码会做一些非常有趣的事情，而我不知道您到底想做什么。首先，strsplit给出一个列表，因此id == fe[j]将始终返回false，除非fe[j]是一个列表本身。我怀疑它是......所以我会更正你的代码，以防

id = strsplit(colnames(Exp)[i],"\\.")[[1]][1]

你想与点之前的所有内容进行比较，或者

id = unlist(strsplit(colnames(Exp)[i],"\\."))

如果你想与字符串中的所有内容进行比较。在这种情况下，您也应该使用 %in% 而不是 ==。

其次，您得到的是一个字符矩阵，它本质上是将行相乘。如果 fe[j] 中的所有元素都是唯一的，您也可以这样做：

only <- rbind(names(exp),exp)
only <- do.call(cbind,lapply(mat,function(x) 
       matrix(rep(x,ncol(exp)-1),nrow=nrow(exp)+1)
))

假设代码中的逻辑确实有意义（因为您没有应用一些示例数据，这是不可能知道的），优化运行：

mat <- rbind(names(Exp),Exp)

do.call(cbind,
    lapply(mat, function(x){
        n <- sum(!fe %in% strsplit(x[1],"\\.")[[1]][1])
        matrix(rep(x,n),nrow=nrow(mat))
}))

请注意- 如果您对 fe[j] 出现在名称中的任何位置感兴趣 - 您可以将代码更改为：

do.call(cbind,
    lapply(mat, function(x){
        n <- sum(!fe %in% unlist(strsplit(x[1],"\\.")))
        matrix(rep(x,n),nrow=nrow(mat))
}))

如果这没有返回您想要的内容，那么您的代码也不会执行此操作。我检查了以下示例数据，所有结果都给出了相同的结果：

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe <- LETTERS[1:4]

EDIT : If you only want to delete the columns whose name appear in fe, you can simply do :

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe<-LETTERS[1:2]

id <- sapply(strsplit(names(Exp),"\\."),
    function(i)!i[1] %in% fe)
Exp[id]

This code does exactly what your (updated) for-loop does as well, only a lot more efficient. You don't have to loop through fe, the %in% function is vectorized.

In case the name can appear anywhere between the dots, then

id <- sapply(strsplit(names(Exp),"\\."),
    function(i)sum(i %in% fe)==0)

Your code does some very funny things, and I have no clue what exactly you're trying to do. For one, strsplit gives a list, so id == fe[j] will always return false, unless fe[j] is a list itself. And I doubt it is... So I'd correct your code as

id = strsplit(colnames(Exp)[i],"\\.")[[1]][1]

in case you want to compare with everything that is before the dot, or to

id = unlist(strsplit(colnames(Exp)[i],"\\."))

if you want to compare with everything in the string. In that case, you should use %in%instead of == as well.

Second, what you get is a character matrix, which essentially multiplies rows. if all elements in fe[j] are unique, you could as well do :

only <- rbind(names(exp),exp)
only <- do.call(cbind,lapply(mat,function(x) 
       matrix(rep(x,ncol(exp)-1),nrow=nrow(exp)+1)
))

Assuming that the logic in your code does make sense (as you didn't apply some sample data this is impossible to know), the optimalization runs :

mat <- rbind(names(Exp),Exp)

do.call(cbind,
    lapply(mat, function(x){
        n <- sum(!fe %in% strsplit(x[1],"\\.")[[1]][1])
        matrix(rep(x,n),nrow=nrow(mat))
}))

Note that - in case you are interested if fe[j] appears anywhere in the name - you can change the code to :

do.call(cbind,
    lapply(mat, function(x){
        n <- sum(!fe %in% unlist(strsplit(x[1],"\\.")))
        matrix(rep(x,n),nrow=nrow(mat))
}))

If this doesn't return what you want, then your code doesn't do that either. I checked with following sample data, and all gives the same result :

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe <- LETTERS[1:4]

回复收藏 0 原文

小嗷兮 2024-11-03 11:00:29

apply() 系列函数是便利函数。它们不一定比编写良好的 for 循环或向量化函数更快。例如：

set.seed(21)
x <- matrix(rnorm(1e6),5e5,2)

system.time({
  yLoop <- x[,1]*0  # preallocate result
  for(i in 1:NROW(yLoop)) yLoop[i] <- mean(x[i,])
})
#    user  system elapsed 
#   13.39    0.00   13.39 
system.time(yApply <- apply(x, 1, mean))
#    user  system elapsed 
#   16.19    0.28   16.51
system.time(yRowMean <- rowMeans(x))
#    user  system elapsed 
#    0.02    0.00    0.02
identical(yLoop,yApply,yRowMean)
# TRUE

您的代码如此缓慢的原因是 - 正如 Gavin 指出的 - 您正在为每次循环迭代增加数组。在循环之前预分配整个数组，您将看到显着的加速。

The apply() family of functions are convenience functions. They will not necessarily be faster than a well-written for loop or vectorized functions. For example:

set.seed(21)
x <- matrix(rnorm(1e6),5e5,2)

system.time({
  yLoop <- x[,1]*0  # preallocate result
  for(i in 1:NROW(yLoop)) yLoop[i] <- mean(x[i,])
})
#    user  system elapsed 
#   13.39    0.00   13.39 
system.time(yApply <- apply(x, 1, mean))
#    user  system elapsed 
#   16.19    0.28   16.51
system.time(yRowMean <- rowMeans(x))
#    user  system elapsed 
#    0.02    0.00    0.02
identical(yLoop,yApply,yRowMean)
# TRUE

The reason your code is so slow is that--as Gavin pointed out--you're growing your array for every loop iteration. Preallocate the entire array before the loop and you will see a significant speedup.

回复收藏 0 原文

~没有更多了~