循环在 ddply 中创建新变量

发布于 2024-11-04 17:22:07 字数 1061 浏览 0 评论 0原文

我正在使用 ddply 来聚合和汇总数据框变量,并且我有兴趣循环遍历数据框的列表以创建新变量。

new.data <- ddply(old.data, 
                  c("factor", "factor2"),
                  function(df)
                    c(a11_a10 = CustomFunction(df$a11_a10),
                      a12_a11 = CustomFunction(df$a12_a11),
                      a13_a12 = CustomFunction(df$a13_a12),
                      ...
                      ...
                      ...))

有没有办法让我在 ddply 中插入一个循环,这样我就可以避免写出每个新的摘要变量,例如

for (i in 11:n) {
  paste("a", i, "_a", i - 1) = CustomFunction(..... )
}

我知道这不是实际完成的方式,但我只是想展示我如何概念化它。有没有办法在我在 ddply 中调用的函数中或通过列表来执行此操作?

更新:因为我是新用户,所以我无法发布自己问题的答案:

我的答案涉及 Nick 的答案和 Ista 的评论中的想法:

func <- function(old.data, min, max, gap) {
  varrange <- min:max
  usenames <- paste("a", varrange, "_a", varrange - gap, sep="")
  new.data <- ddply(old.data,
                    .(factor, factor2),
                    colwise(CustomFunction, c(usenames)))
}

I am using ddply to aggregate and summarize data frame variables, and I am interested in looping through my data frame's list to create the new variables.

new.data <- ddply(old.data, 
                  c("factor", "factor2"),
                  function(df)
                    c(a11_a10 = CustomFunction(df$a11_a10),
                      a12_a11 = CustomFunction(df$a12_a11),
                      a13_a12 = CustomFunction(df$a13_a12),
                      ...
                      ...
                      ...))

Is there a way for me to insert a loop in ddply so that I can avoid writing each new summary variable out, e.g.

for (i in 11:n) {
  paste("a", i, "_a", i - 1) = CustomFunction(..... )
}

I know that this is not how it would actually be done, but I just wanted to show how I'd conceptualize it. Is there a way to do this in the function I call in ddply, or via a list?

UPDATE: Because I'm a new user, I can't post an answer to my own question:

My answer involves ideas from Nick's answer and Ista's comment:

func <- function(old.data, min, max, gap) {
  varrange <- min:max
  usenames <- paste("a", varrange, "_a", varrange - gap, sep="")
  new.data <- ddply(old.data,
                    .(factor, factor2),
                    colwise(CustomFunction, c(usenames)))
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

蓝海似她心 2024-11-11 17:22:07

基于 @Nick 的出色回答,这是解决该问题的一种方法

foo <- function(df){
  names   = paste("a", 11:n, "_a", 10:(n-1), sep = "")
  results = sapply(df[,names], CustomFunction)
}

new.data = ldply(dlply(old.data, c("factor", "factor2")), foo)

。这是一个使用 ggplot2 中的 tips 数据集的示例应用程序。假设我们想通过 sexsmoker 的组合来计算 tiptotal_bill 的平均值,以下是代码可以工作

foo = function(df){names = c("tip", "total_bill"); sapply(df[,names], mean)}
new = ldply(dlply(tips, c("sex", "smoker")), foo)

它会产生如下所示的输出

         .id      tip total_bill
1  Female.No 2.773519   18.10519
2 Female.Yes 2.931515   17.97788
3    Male.No 3.113402   19.79124
4   Male.Yes 3.051167   22.28450

这是您正在寻找的吗?

Building on the excellent answer by @Nick, here is one approach to the problem

foo <- function(df){
  names   = paste("a", 11:n, "_a", 10:(n-1), sep = "")
  results = sapply(df[,names], CustomFunction)
}

new.data = ldply(dlply(old.data, c("factor", "factor2")), foo)

Here is an example application using the tips dataset in ggplot2. Suppose we want to calculate the average of tip and total_bill by combination of sex and smoker, here is how the code would work

foo = function(df){names = c("tip", "total_bill"); sapply(df[,names], mean)}
new = ldply(dlply(tips, c("sex", "smoker")), foo)

It produces the output shown below

         .id      tip total_bill
1  Female.No 2.773519   18.10519
2 Female.Yes 2.931515   17.97788
3    Male.No 3.113402   19.79124
4   Male.Yes 3.051167   22.28450

Is this what you were looking for?

睡美人的小仙女 2024-11-11 17:22:07

如果我理解正确的话,您本质上是想将自定义函数应用于 ddply 数据框架中的每一列。

好消息是有一个 ddply 函数可以做到这一点。这意味着问题的解决方案可以归结为一句话:

以 @Ramnath 的优秀示例为基础:

library(ggplot2)
customfunction <- mean
ddply(tips, .(sex, smoker), numcolwise(customfunction))

     sex smoker total_bill      tip     size
1 Female     No   18.10519 2.773519 2.592593
2 Female    Yes   17.97788 2.931515 2.242424
3   Male     No   19.79124 3.113402 2.711340
4   Male    Yes   22.28450 3.051167 2.500000

它起作用的原因是 colwise 将一个作用于向量的函数转换为一个作用于向量的函数在 data.frame 中的列上。 colwise 有两种变体:numcolwise 仅适用于数字列,catcolwise 适用于分类列。请参阅?colwise 了解更多信息。

编辑:

我理解您可能不希望将该函数应用于 data.frame 中的所有列。不过,我发现这种语法非常简单,我的一般方法是修改传递给 ddply 的 data.frame。例如,以下修改示例子集 tips 以排除某些列。解决方案仍然是一句话:

ddply(tips[, -2], .(sex, smoker), numcolwise(customfunction))

     sex smoker total_bill     size
1 Female     No   18.10519 2.592593
2 Female    Yes   17.97788 2.242424
3   Male     No   19.79124 2.711340
4   Male    Yes   22.28450 2.500000

If I understand you correctly, you essentially want to apply a custom function to every column in the ddply data.frame.

The good news is there is a ddply function that does exactly that. This means the solution to your problem boils down to a one liner:

Building on the excellent example of @Ramnath:

library(ggplot2)
customfunction <- mean
ddply(tips, .(sex, smoker), numcolwise(customfunction))

     sex smoker total_bill      tip     size
1 Female     No   18.10519 2.773519 2.592593
2 Female    Yes   17.97788 2.931515 2.242424
3   Male     No   19.79124 3.113402 2.711340
4   Male    Yes   22.28450 3.051167 2.500000

The reason this works is that colwise turns a function that works on a vector into a function that works on a column in a data.frame. There are two variants of colwise: numcolwise works only on numeric columns, and catcolwise works on categorical columns. See?colwise for more information.

EDIT:

I appreciate that you may not wish to apply the function to all columns in your data.frame. Still, I find this syntax so easy, that my general approach would be to modify the data.frame that I pass to ddply. For example, the following modified example subsets tips to exclude some columns. The solution is still a one-liner:

ddply(tips[, -2], .(sex, smoker), numcolwise(customfunction))

     sex smoker total_bill     size
1 Female     No   18.10519 2.592593
2 Female    Yes   17.97788 2.242424
3   Male     No   19.79124 2.711340
4   Male    Yes   22.28450 2.500000
葵雨 2024-11-11 17:22:07

步骤:

varrange<-11:n
usenames<-paste("a", varrange, "_a", varrange - 1, sep="")
results<-sapply(usenames, function(curname){CustomFunction(df[,curname])})
names(results)<-usenames

这是你想要的吗?

In steps:

varrange<-11:n
usenames<-paste("a", varrange, "_a", varrange - 1, sep="")
results<-sapply(usenames, function(curname){CustomFunction(df[,curname])})
names(results)<-usenames

Is this what you want?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文