循环在 ddply 中创建新变量
我正在使用 ddply 来聚合和汇总数据框变量,并且我有兴趣循环遍历数据框的列表以创建新变量。
new.data <- ddply(old.data,
c("factor", "factor2"),
function(df)
c(a11_a10 = CustomFunction(df$a11_a10),
a12_a11 = CustomFunction(df$a12_a11),
a13_a12 = CustomFunction(df$a13_a12),
...
...
...))
有没有办法让我在 ddply 中插入一个循环,这样我就可以避免写出每个新的摘要变量,例如
for (i in 11:n) {
paste("a", i, "_a", i - 1) = CustomFunction(..... )
}
我知道这不是实际完成的方式,但我只是想展示我如何概念化它。有没有办法在我在 ddply 中调用的函数中或通过列表来执行此操作?
更新:因为我是新用户,所以我无法发布自己问题的答案:
我的答案涉及 Nick 的答案和 Ista 的评论中的想法:
func <- function(old.data, min, max, gap) {
varrange <- min:max
usenames <- paste("a", varrange, "_a", varrange - gap, sep="")
new.data <- ddply(old.data,
.(factor, factor2),
colwise(CustomFunction, c(usenames)))
}
I am using ddply to aggregate and summarize data frame variables, and I am interested in looping through my data frame's list to create the new variables.
new.data <- ddply(old.data,
c("factor", "factor2"),
function(df)
c(a11_a10 = CustomFunction(df$a11_a10),
a12_a11 = CustomFunction(df$a12_a11),
a13_a12 = CustomFunction(df$a13_a12),
...
...
...))
Is there a way for me to insert a loop in ddply so that I can avoid writing each new summary variable out, e.g.
for (i in 11:n) {
paste("a", i, "_a", i - 1) = CustomFunction(..... )
}
I know that this is not how it would actually be done, but I just wanted to show how I'd conceptualize it. Is there a way to do this in the function I call in ddply, or via a list?
UPDATE: Because I'm a new user, I can't post an answer to my own question:
My answer involves ideas from Nick's answer and Ista's comment:
func <- function(old.data, min, max, gap) {
varrange <- min:max
usenames <- paste("a", varrange, "_a", varrange - gap, sep="")
new.data <- ddply(old.data,
.(factor, factor2),
colwise(CustomFunction, c(usenames)))
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
基于 @Nick 的出色回答,这是解决该问题的一种方法
。这是一个使用 ggplot2 中的
tips
数据集的示例应用程序。假设我们想通过sex
和smoker
的组合来计算tip
和total_bill
的平均值,以下是代码可以工作它会产生如下所示的输出
这是您正在寻找的吗?
Building on the excellent answer by @Nick, here is one approach to the problem
Here is an example application using the
tips
dataset inggplot2
. Suppose we want to calculate the average oftip
andtotal_bill
by combination ofsex
andsmoker
, here is how the code would workIt produces the output shown below
Is this what you were looking for?
如果我理解正确的话,您本质上是想将自定义函数应用于 ddply 数据框架中的每一列。
好消息是有一个 ddply 函数可以做到这一点。这意味着问题的解决方案可以归结为一句话:
以 @Ramnath 的优秀示例为基础:
它起作用的原因是
colwise
将一个作用于向量的函数转换为一个作用于向量的函数在 data.frame 中的列上。colwise
有两种变体:numcolwise
仅适用于数字列,catcolwise
适用于分类列。请参阅?colwise
了解更多信息。编辑:
我理解您可能不希望将该函数应用于 data.frame 中的所有列。不过,我发现这种语法非常简单,我的一般方法是修改传递给 ddply 的 data.frame。例如,以下修改示例子集
tips
以排除某些列。解决方案仍然是一句话:If I understand you correctly, you essentially want to apply a custom function to every column in the
ddply
data.frame.The good news is there is a
ddply
function that does exactly that. This means the solution to your problem boils down to a one liner:Building on the excellent example of @Ramnath:
The reason this works is that
colwise
turns a function that works on a vector into a function that works on a column in a data.frame. There are two variants ofcolwise
:numcolwise
works only on numeric columns, andcatcolwise
works on categorical columns. See?colwise
for more information.EDIT:
I appreciate that you may not wish to apply the function to all columns in your data.frame. Still, I find this syntax so easy, that my general approach would be to modify the data.frame that I pass to
ddply
. For example, the following modified example subsetstips
to exclude some columns. The solution is still a one-liner:步骤:
这是你想要的吗?
In steps:
Is this what you want?