将多个函数应用于数据框的每一行

发布于 2024-12-01 04:13:24 字数 1102 浏览 1 评论 0原文

每当我认为我了解了向量的使用时,一个看似简单的问题就会让我的头脑翻天覆地。在这种情况下,大量阅读和尝试不同的例子并没有帮助。请在这里用勺子喂我...

我想将两个自定义函数应用于数据框的每一行,并将结果添加为两个新列。这是我的示例代码:

# Required packages:
library(plyr)

FindMFE <- function(x) {
    MFE <- max(x, na.rm = TRUE) 
    MFE <- ifelse(is.infinite(MFE ) | (MFE  < 0), 0, MFE)
    return(MFE)
}

FindMAE <- function(x) {
    MAE <- min(x, na.rm = TRUE) 
    MAE <- ifelse(is.infinite(MAE) | (MAE> 0), 0, MAE)
    return(MAE)
}

FindMAEandMFE <- function(x){
        # I know this next line is wrong...
    z <- apply(x, 1, FindMFE, FindMFE)
        return(z)
}

df1 <- data.frame(Bar1=c(1,2,3,-3,-2,-1),Bar2=c(3,1,3,-2,-3,-1))

df1 = transform(df1, 
    FindMAEandMFE(df1)  
)

#DF1 should end up with the following data...
#Bar1   Bar2    MFE MAE
#1      3       3   0
#2      1       2   0
#3      3       3   0
#-3     -2      0   -3
#-2     -3      0   -3
#-1     -1      0   -1

如果使用 plyr 库和更基础的方法获得答案,那就太好了。两者都有助于我的理解。当然,如果有明显的错误,请指出我哪里错了。 ;-)

现在回到我的帮助文件!

编辑:我想要一个多元解决方案,因为列名称可能会随着时间的推移而改变和扩展。它还允许将来重复使用代码。

Every time I think I understand about working with vectors, what appears to be a simple problem turns my head inside out. Lot's of reading and trying different examples hasn't helped on this occasion. Please spoon feed me here...

I want to apply two custom functions to each row of a dataframe and add the results as a two new columns. Here is my sample code:

# Required packages:
library(plyr)

FindMFE <- function(x) {
    MFE <- max(x, na.rm = TRUE) 
    MFE <- ifelse(is.infinite(MFE ) | (MFE  < 0), 0, MFE)
    return(MFE)
}

FindMAE <- function(x) {
    MAE <- min(x, na.rm = TRUE) 
    MAE <- ifelse(is.infinite(MAE) | (MAE> 0), 0, MAE)
    return(MAE)
}

FindMAEandMFE <- function(x){
        # I know this next line is wrong...
    z <- apply(x, 1, FindMFE, FindMFE)
        return(z)
}

df1 <- data.frame(Bar1=c(1,2,3,-3,-2,-1),Bar2=c(3,1,3,-2,-3,-1))

df1 = transform(df1, 
    FindMAEandMFE(df1)  
)

#DF1 should end up with the following data...
#Bar1   Bar2    MFE MAE
#1      3       3   0
#2      1       2   0
#3      3       3   0
#-3     -2      0   -3
#-2     -3      0   -3
#-1     -1      0   -1

It would be great to get an answer using the plyr library and a more base like approach. Both will aid in my understanding. Of course, please point out where I'm going wrong if it's obvious. ;-)

Now back to the help files for me!

Edit: I would like a multivariate solution as column names may change and expand over time. It also allows re-use of the code in future.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

万劫不复 2024-12-08 04:13:24

我认为你在这里想得太复杂了。两个单独的 apply() 调用有什么问题?然而,有一种更好的方法可以完成您在这里所做的事情,不涉及循环/应用调用。我将分别处理这些问题,但第二种解决方案更可取,因为它是真正矢量化的。

两个 apply 调用版本

前两个单独的 apply 调用使用全基 R 函数:

df1 <- data.frame(Bar1=c(1,2,3,-3,-2,-1),Bar2=c(3,1,3,-2,-3,-1))
df1 <- transform(df1, MFE = apply(df1, 1, FindMFE), MAE = apply(df1, 1, FindMAE))
df1

给出:

> df1
  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

好的​​,循环 df1 的行两次可能有点低效,但即使对于您花费的大问题考虑在一次中巧妙地完成此操作所节省的时间已经比这样做节省的时间多了。

使用向量化函数 pmax()pmin()

因此,更好的方法是记下 pmax()pmin () 函数并意识到它们可以执行每个 apply(df1, 1, FindFOO() 调用正在执行的操作。例如:

> (tmp <- with(df1, pmax(0, Bar1, Bar2, na.rm = TRUE)))
[1] 3 2 3 0 0 0

将是您问题中的 MFE。这非常简单如果您有两列并且它们总是 Bar1Bar2df1 的前 2 列,但如果您想要多个列怎么办?来计算这个等等? pmax(df1[, 1:2], na.rm = TRUE) 不会做我们想要的:

> pmax(df1[, 1:2], na.rm = TRUE)
  Bar1 Bar2
1    1    3
2    2    1
3    3    3
4   -3   -2
5   -2   -3
6   -1   -1

使用 pmax( 获得通用解决方案的技巧)pmin() 是使用 do.call() 为我们安排对这两个函数的调用,以使用我们拥有的这个想法:

FindMFE2 <- function(x) {
   MFE <- do.call(pmax, c(as.list(x), 0, na.rm = TRUE))
   MFE[is.infinite(MFE)] <- 0
   MFE
}

FindMAE2 <- function(x) {
   MAE <- do.call(pmin, c(as.list(x), 0, na.rm = TRUE))
   MAE[is.infinite(MAE)] <- 0
   MAE
}

> transform(df1, MFE = FindMFE2(df1), MAE = FindMAE2(df1))
  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

。不是 apply() 如果您想一步完成此操作,现在更容易包装:

FindMAEandMFE2 <- function(x){
    cbind(MFE = FindMFE2(x), MAE = FindMAE2(x))
}

可以用作:

> cbind(df1, FindMAEandMFE2(df1))
  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

I think you are thinking too complex here. What is wrong with two separate apply() calls? There is however a far better way to do what you are doing here that involves no looping/apply calls. I'll deal with these separately, but the second solution is preferable as it is truly vectorised.

Two apply calls version

First two separate apply calls using all-Base R functions:

df1 <- data.frame(Bar1=c(1,2,3,-3,-2,-1),Bar2=c(3,1,3,-2,-3,-1))
df1 <- transform(df1, MFE = apply(df1, 1, FindMFE), MAE = apply(df1, 1, FindMAE))
df1

Which gives:

> df1
  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

Ok, looping over the rows of df1 twice is perhaps a little inefficient, but even for big problems you've spent more time already thinking about doing this cleverly in a single pass than you will save by doing that way.

Using vectorised functions pmax() and pmin()

So a better way of doing this is to note the pmax() and pmin() functions and realise that they can do what each the apply(df1, 1, FindFOO() calls were doing. For example:

> (tmp <- with(df1, pmax(0, Bar1, Bar2, na.rm = TRUE)))
[1] 3 2 3 0 0 0

would be MFE from your Question. This is very simple to work with if you have two columns and they are Bar1 and Bar2 or the first 2 columns of df1, always. But it is not very general; what if you have multiple columns you want to compute this over etc? pmax(df1[, 1:2], na.rm = TRUE) won't do what we want:

> pmax(df1[, 1:2], na.rm = TRUE)
  Bar1 Bar2
1    1    3
2    2    1
3    3    3
4   -3   -2
5   -2   -3
6   -1   -1

The trick to getting a general solution using pmax() and pmin() is to use do.call() to arrange the calls to those two functions for us. Updating your functions to use this idea we have:

FindMFE2 <- function(x) {
   MFE <- do.call(pmax, c(as.list(x), 0, na.rm = TRUE))
   MFE[is.infinite(MFE)] <- 0
   MFE
}

FindMAE2 <- function(x) {
   MAE <- do.call(pmin, c(as.list(x), 0, na.rm = TRUE))
   MAE[is.infinite(MAE)] <- 0
   MAE
}

which give:

> transform(df1, MFE = FindMFE2(df1), MAE = FindMAE2(df1))
  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

and not an apply() in sight. If you want to do this in a single step, this is now much easier to wrap:

FindMAEandMFE2 <- function(x){
    cbind(MFE = FindMFE2(x), MAE = FindMAE2(x))
}

which can be used as:

> cbind(df1, FindMAEandMFE2(df1))
  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1
南渊 2024-12-08 04:13:24

我展示了三种替代的单行代码:

  • 使用 plyreach 函数
  • plyr each 函数与基础 R 一起使用
  • 使用矢量化的 pminpmax 函数

解决方案 1:plyr 和each

plyr 包定义了 each 函数这就是你想要的。来自 ?each将多个函数聚合为一个函数。 这意味着您可以使用单行代码解决您的问题:

library(plyr)
adply(df1, 1, each(MAE=function(x)max(x, 0), MFE=function(x)min(x, 0)))

  Bar1 Bar2 MAE MFE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

解决方案 2:each 和基 R

您可以,当然,将 each 与基本函数一起使用。以下是如何将其与 apply 一起使用 - 请注意,在添加到原始 data.frame 之前必须转置结果。

library(plyr)
data.frame(df1, 
  t(apply(df1, 1, each(MAE=function(x)max(x, 0), MFE=function(x)min(x, 0)))))

  Bar1 Bar2 MAE MFE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

解决方案 3:使用向量化函数

使用向量化函数 pminpmax,您可以使用以下单行代码:

transform(df1, MFE=pmax(0, Bar1, Bar2), MAE=pmin(0, Bar1, Bar2))

  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

I show three alternative one-liners:

  • Using the each function of plyr
  • Using the plyr each function with base R
  • Using the pmin and pmax functions that are vectorise

Solution 1: plyr and each

The plyr package defines the each function that does what you want. From ?each: Aggregate multiple functions into a single function. This means you can solve your problem using a one-liner:

library(plyr)
adply(df1, 1, each(MAE=function(x)max(x, 0), MFE=function(x)min(x, 0)))

  Bar1 Bar2 MAE MFE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

Solution 2: each and base R

You can, of course, use each with base functions. Here is how you can use it with apply - just note that you have to transpose the results before adding to your original data.frame.

library(plyr)
data.frame(df1, 
  t(apply(df1, 1, each(MAE=function(x)max(x, 0), MFE=function(x)min(x, 0)))))

  Bar1 Bar2 MAE MFE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

Solution 3: using vectorised functions

Using vectorised functions pmin and pmax, you can use this one-liner:

transform(df1, MFE=pmax(0, Bar1, Bar2), MAE=pmin(0, Bar1, Bar2))

  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1
瞄了个咪的 2024-12-08 04:13:24

这里有很多好的答案。我在 Gavin Simpson 编辑时开始了这个工作,所以我们涵盖了一些类似的内容。并行最小值和最大值(pmin 和 pmax)的作用几乎正是您编写函数的目的。 pmax(0, Bar1, Bar2) 中 0 的作用可能有点不透明,但本质上 0 会被回收,所以这就像这样做,

pmax(c(0,0,0,0,0,0), Bar1, Bar2)

它将获取传递的三件事中的每一项并找到它们的最大值。因此,如果 max 为负数,则 max 将为 0,并且可以完成 ifelse 语句的大部分功能。您可以重写,以便获得向量并将事物与与您正在做的类似的功能组合起来,这可能会使其更加透明。在这种情况下,我们只需将数据帧传递给一个新的并行且快速的 findMFE 函数,该函数将处理任何数字数据帧并获取向量。

findMFE <- function(dataf){
    MFE <- do.call( pmax, c(dataf, 0, na.rm = TRUE))
}

MFE <- findMFE(df1)

该函数的作用是向传递的数据帧添加额外的 0 列,然后调用 pmax 传递 df1 的每个单独列,就好像它是一个列表一样(数据帧是列表,因此这很容易)。

现在,我注意到您实际上想要纠正数据中不在示例中的 Inf 值...我们可以向您的函数添加额外的行...

findMFE <- function(dataf){
    MFE <- do.call( pmax, c(dataf, 0, na.rm = TRUE))
    ifelse(is.infinite(MFE), 0, MFE)
}

现在,这是 ifelse() 函数的正确使用一个向量。我这样做是为了给你一个例子,但 Gavin Simpson 使用 MFE[is.infinite(MFE)] <- 0 更有效。请注意,此 findMFE 函数不是在循环中使用,它只是传递了整个数据帧。

类似的 findMAE 是...

findMAE <- function(dataf){
    MAE <- do.call( pmin, c(dataf, 0, na.rm = TRUE))
    ifelse(is.infinite(MAE), 0, MAE)
}

并且组合函数很简单...

findMFEandMAE <- function(dataf){
    MFE <- findMFE(dataf)
    MAE <- findMAE(dataf)
    return(data.frame(MFE, MAE))
}

MFEandMAE <- findMFEandMAE(df1)
df1 <- cbind(df1, MFEandMAE)

一些提示

如果您有标量 if 语句,请不要使用 ifelse(),请使用 if() else。在标量情况下它要快得多。而且,您的函数是标量,并且您正在尝试对它们进行矢量化。 ifelse() 已经向量化,以这种方式使用时运行速度非常快,但使用标量时比 if() else 慢得多。

另外,如果您要将内容放入循环或 apply 语句中,请尽可能少地放入其中。例如,在您的情况下,确实需要将 ifelse() 从循环中取出,然后应用于整个 MFE 结果。

There are lots of good answers here. I started this while Gavin Simpson was editing so we cover some similar ground. What the parallel min and max do (pmin and pmax) is pretty much exactly what you're writing your functions for. It may be a little opaque what the 0 does in pmax(0, Bar1, Bar2) but essentially 0 gets recycled so that's it's like doing

pmax(c(0,0,0,0,0,0), Bar1, Bar2)

That will take each item of the three things passed and find the max of them. So, the max will be 0 if it was negative and accomplishes much of what your ifelse statement did. You could rewrite so you get vectors and combine things with functions similar to what you were doing and that might make it a bit more transparent. In this case we'd just pass the dataframe to a new parallel and fast findMFE function that will work with any numeric dataframe and get out a vector.

findMFE <- function(dataf){
    MFE <- do.call( pmax, c(dataf, 0, na.rm = TRUE))
}

MFE <- findMFE(df1)

What this function does is add an extra column of 0s to the passed data frame and then call pmax passing each separate column of df1 as if it were a list (dataframes are lists so this is easy).

Now, I note that you actually want to correct for Inf values in your data that aren't in your example... we could add an extra line to your function...

findMFE <- function(dataf){
    MFE <- do.call( pmax, c(dataf, 0, na.rm = TRUE))
    ifelse(is.infinite(MFE), 0, MFE)
}

Now, that's proper use of the ifelse() function on a vector. I did it that way as an example for you but Gavin Simpson's use of MFE[is.infinite(MFE)] <- 0 is more efficient. Note that this findMFE function isn't used in a loop, it's just passed the whole data frame.

The comparable findMAE is...

findMAE <- function(dataf){
    MAE <- do.call( pmin, c(dataf, 0, na.rm = TRUE))
    ifelse(is.infinite(MAE), 0, MAE)
}

and the combined function is simply...

findMFEandMAE <- function(dataf){
    MFE <- findMFE(dataf)
    MAE <- findMAE(dataf)
    return(data.frame(MFE, MAE))
}

MFEandMAE <- findMFEandMAE(df1)
df1 <- cbind(df1, MFEandMAE)

Some tips

If you've got a scalar if statement don't use ifelse(), use if() else. It's much faster in scalar situations. And, your functions are scalar and you're trying to vectorize them. ifelse() is already vectorized and runs very fast when used that way but much slower than if() else when used scalar.

Also, if you're going to be putting stuff in a loop or apply statement put as little in there as possible. For example, in your case the ifelse() really needed to be taken out of the loop and applied to the whole MFE result afterwards.

只为一人 2024-12-08 04:13:24

如果您真的非常想要它,您可以:(

FindMAEandMFE <- function(x){
    t(apply(x, 1, function(currow){c(MAE=FindMAE(currow), MFE=FindMFE(currow))}))
}

未经测试 - 它应该返回一个包含两个(我认为已命名)列和与 data.frame 一样多的行的数组)。现在你可以做:

df1<-cbind(df1, FindMAEandMFE(df1))

非常恶心。请听取加文的建议。

If you really, really want it, you can:

FindMAEandMFE <- function(x){
    t(apply(x, 1, function(currow){c(MAE=FindMAE(currow), MFE=FindMFE(currow))}))
}

(not tested - it should return an array with two (named, I think) columns and as many rows as the data.frame had). Now you can do:

df1<-cbind(df1, FindMAEandMFE(df1))

Very icky. Please heed Gavin's advice.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文