使用 lapply/sapply 重新计算数据框中的每个点

发布于 2024-10-29 23:33:07 字数 1549 浏览 3 评论 0原文

我编写了自己的名为 batcheffect 的函数来重新计算数据框中的所有值。该函数只需要导入数据框。首先，在函数中计算平均值，然后对数据框中的每个点进行计算并创建一个新的数据框。

batcheffect <- function (experiment){    
   corr<-list()    
   matrixexp<-as.matrix(experiment)    
   expmean <-mean(matrixexp)

   for (i in 1:length(matrixexp)){    
      correction <- (matrixexp[i]-overallmean - expmean)+overallmean    
      corr[[i]]<- matrix(correction)
   }
   return(unlist(corr)) 
}

对于大型数据帧，函数内的循环很慢。所以我想使用 sapply 或 lapply 函数来加速该过程。有人有建议吗？

谢谢

更新：例如我有一个像这样的数据框 df<- data.frame(A=1:10,B=10:1,C=11:20,C1=21:30,B1=31:40,A2=41:50)

计算所有值的平均值在数据框中。数据帧转换为矩阵 df1<-as.matrix(df) 总体平均值<-平均值(df1)

数据的第一个目标是通过列名创建子集。您生成三个组：A 组、B 组和 C 组。子集由以下代码定义：

"selectexperiments" <- function (partialname, data) 
{
result <- data[,grep(partialname, colnames(data))]
return(result)
}
A<-selectexperiments('A', df)
B<-selectexperiments('B', df)
C<-selectexperiments('C', df)

创建三个组。对于eggroup中的每个值，AI想要计算以下总和：（值 - 总体平均值 - 组平均值）+ 总体平均值。因此我创建了这个batcheffect函数。

"batcheffect" <- function (group)
{
corr<-list()
matrixexp<-as.matrix(group)
expmean <-mean(matrixexp) #mean of the group
for (i in 1:length(matrixexp)){ 
correction <- (matrixexp[i]-overallmean - expmean)+overallmean
corr[[i]]<- matrix(correction)
}
return(unlist(corr))
}

Abatch<-batcheffect(A)

现在结果可以了，但我会将结果作为数据帧返回。对于我自己的数据来说，这个函数真的很慢，所以我想可能有一种加速方法，比如 sapply of some 东西。

原文

I write my own function named batcheffect to recalculate all values in a dataframe.
The function only needs the dataframe as import. First, the mean is calculated in the function and then for each point in the dataframe the calculation is made and create a new dataframe.

batcheffect <- function (experiment){    
   corr<-list()    
   matrixexp<-as.matrix(experiment)    
   expmean <-mean(matrixexp)

   for (i in 1:length(matrixexp)){    
      correction <- (matrixexp[i]-overallmean - expmean)+overallmean    
      corr[[i]]<- matrix(correction)
   }
   return(unlist(corr)) 
}

For a large dataframe the loop inside a function is slow. So i want to use a sapply or lapply function to speed up the process. Has anyone a suggestion?

Thanks

UPDATE:
For example I have a dataframe like this
df<- data.frame(A=1:10,B=10:1,C=11:20,C1=21:30,B1=31:40,A2=41:50)

To calculate the mean for all values in the dataframe. The dataframe is converted to a matrix
df1<-as.matrix(df)
overallmean<-mean(df1)

The first goal of the data is to make subsets by colnames. You generate three groups, group with A's, group with B's and group with C's. the subsets are defined by the following code:

"selectexperiments" <- function (partialname, data) 
{
result <- data[,grep(partialname, colnames(data))]
return(result)
}
A<-selectexperiments('A', df)
B<-selectexperiments('B', df)
C<-selectexperiments('C', df)

The three groups are created. For each value in e.g.group A I want to caluclate the following sum:
(value - overallmean -meanofthegroup) + overallmean.
therefore I create this batcheffect function.

"batcheffect" <- function (group)
{
corr<-list()
matrixexp<-as.matrix(group)
expmean <-mean(matrixexp) #mean of the group
for (i in 1:length(matrixexp)){ 
correction <- (matrixexp[i]-overallmean - expmean)+overallmean
corr[[i]]<- matrix(correction)
}
return(unlist(corr))
}

Abatch<-batcheffect(A)

The result is OK now, But I will returned the result as a dataframe. And for my own data this function is realy slow so, i thought maby is there a speeding up method like sapply of something.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

流殇 2024-11-05 23:33:07

你的功能很奇怪。它可以简化为 :

batcheffect <- function (experiment){
    matrixexp<-as.matrix(experiment)
    expmean <-mean(matrixexp)
    c(matrixexp - expmean)
}

并将给出完全相同的结果。简单的微积分表明

(matrixexp[i]-overallmean - expmean)+overallmean

完全等于

matrixexp[i]- expmean

并且由于 R 计算是矢量化的，所以循环不是必要的。它返回一个向量（因此是 c() 函数）。

使用unlist()，您可以进一步简化为：

batcheffect2 <- function(experiment){
  x <- unlist(experiment,use.names=F)
  x - mean(x)
}

它再次返回完全相同的结果。你确定这就是你的想法吗？

编辑：

鉴于您的评论，我在此处添加测试代码。我将您的原始函数命名为 old.batcheffect()。正如您所看到的，在一个示例数据帧上（并且在初始化神秘的overallmean之后），所有函数的结果都是相同的：

> Df <- data.frame(A1=1:10,B1=10:1,C1=11:20)
> overallmean <- runif(1)
> X1 <- old.batcheffect(Df)
> X2 <- batcheffect(Df)
> X3 <- batcheffect2(Df)

> all.equal(X1,X2)
[1] TRUE
> all.equal(X2,X3)
[1] TRUE

编辑2：

要获得返回像原始数据帧一样的batcheffect，您只需要一行代码：

batcheffect <- function(x) x - mean(unlist(x))

您现在可以在一个函数中处理完整的原始数据帧：

summaryBatch <- function(data,groups){
    tmp <- lapply(groups,function(x){
        data[,grep(x,names(data))]
    })
    out <- lapply(tmp,function(x){
        x - mean(unlist(x))

    })
    do.call(cbind,out)
}

然后：

summaryBatch(df,c("A","B","C"))

返回一个包含所有列的数据帧，其中每列减去组平均值。如前所述，您可以添加并随后删除总体平均值，但这根本没有区别。

Your function is pretty odd. It can be simplified to :

batcheffect <- function (experiment){
    matrixexp<-as.matrix(experiment)
    expmean <-mean(matrixexp)
    c(matrixexp - expmean)
}

and will give exactly the same result. Simple calculus shows that

(matrixexp[i]-overallmean - expmean)+overallmean

is perfectly equal to

matrixexp[i]- expmean

And as R calculations are vectorized a loop is not necessary. It returns a vector (hence the c() function).

Using unlist(), you can further simplify to:

batcheffect2 <- function(experiment){
  x <- unlist(experiment,use.names=F)
  x - mean(x)
}

which again returns exactly the same result. Are you sure this is what you had in mind?

EDIT :

Given your comments, I add here the test code. I named your original function old.batcheffect(). As you see, on a sample dataframe (and after initialization of the mystery overallmean) the result of all functions is identical :

> Df <- data.frame(A1=1:10,B1=10:1,C1=11:20)
> overallmean <- runif(1)
> X1 <- old.batcheffect(Df)
> X2 <- batcheffect(Df)
> X3 <- batcheffect2(Df)

> all.equal(X1,X2)
[1] TRUE
> all.equal(X2,X3)
[1] TRUE

EDIT2 :

To get batcheffect returning a dataframe like the original, you just need one line of code :

batcheffect <- function(x) x - mean(unlist(x))

You can now process the complete original dataframe within one function :

summaryBatch <- function(data,groups){
    tmp <- lapply(groups,function(x){
        data[,grep(x,names(data))]
    })
    out <- lapply(tmp,function(x){
        x - mean(unlist(x))

    })
    do.call(cbind,out)
}

Then :

summaryBatch(df,c("A","B","C"))

returns a dataframe with all columns, where for each column the group mean is substracted. As said before, you can add and subsequently remove the overallmean, but that doesn't make a difference at all.

回复收藏 0 原文

~没有更多了~