是否可以重用 ddply 中生成的列?

发布于 2024-09-12 06:41:30 字数 354 浏览 3 评论 0原文

我有一个使用 ddply 的脚本,如下例所示:

ddply(df, .(col),
function(x) data.frame(
col1=some_function(x$y),
col2=some_other_function(x$y)
)
)

在 ddply 中,是否可以重用 col1 而无需再次调用整个函数?

例如:

ddply(df, .(col),
function(x) data.frame(
col1=some_function(x$y),
col2=some_other_function(x$y)
col3=col1*col2
)
)

I have a script where I'm using ddply, as in the following example:

ddply(df, .(col),
function(x) data.frame(
col1=some_function(x$y),
col2=some_other_function(x$y)
)
)

Within ddply, is it possible to reuse col1 without calling the entire function again?

For example:

ddply(df, .(col),
function(x) data.frame(
col1=some_function(x$y),
col2=some_other_function(x$y)
col3=col1*col2
)
)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

宫墨修音 2024-09-19 06:41:30

您有一个完整的函数可以使用!不一定是单行本!这应该有效:

ddply(df, .(col), function(x) {
  tmp <- some_other_function(x$y)
  data.frame(
    col1=some_function(x$y),
    col2=tmp,
    col3=tmp
  )
})

You've got a whole function to play with! Doesn't have to be a one-liner! This should work:

ddply(df, .(col), function(x) {
  tmp <- some_other_function(x$y)
  data.frame(
    col1=some_function(x$y),
    col2=tmp,
    col3=tmp
  )
})
你是我的挚爱i 2024-09-19 06:41:30

这似乎是使用 j 组件的作用域规则的 data.table 的良好候选者。有关详细信息,请参阅常见问题解答 2.8

来自常见问题解答

没有匿名函数传递给
j.相反,匿名主体被传递给 j。

因此,对于您的情况

library(data.table)
DT <- as.data.table(df)
DT[,{
 col1=some_function(y)
 col2=some_other_function(y)
 col3= col1 *col2
 list(col1 = col1, col2 = col2, col3 = col3)
 }, by = col]  

或更直接的方式:

DT[,list(
 col1=col1<-some_function(y)
 col2=col2<-some_other_function(y)
 col3=col1*col2
 ), by = col]  

这避免了 col1col2 各重复一次,并避免 col3 重复两次;我们努力在 data.table 中减少重复。 = 后面跟着 <- 最初可能看起来很麻烦。不过,这允许使用以下语法糖:

DT[,list(
 "Projected return (%)"=      col1<-some_function(y),
 "Investment ($m)"=           col2<-some_other_function(y),
 "Return on Investment ($m)"= col1*col2
 ), by = col]  

例如,输出可以直接发送到 Latex 或 html。

This appears to be a good candidate for data.table using the scoping rules of the j component. See FAQ 2.8 for details.

From the FAQ

No anonymous function is passed to
the j. Instead, an anonymous body is passed to the j.

So, for your case

library(data.table)
DT <- as.data.table(df)
DT[,{
 col1=some_function(y)
 col2=some_other_function(y)
 col3= col1 *col2
 list(col1 = col1, col2 = col2, col3 = col3)
 }, by = col]  

or a slightly more direct way :

DT[,list(
 col1=col1<-some_function(y)
 col2=col2<-some_other_function(y)
 col3=col1*col2
 ), by = col]  

This avoids one repetition each of col1 and col2, and avoids two repeats of col3; repetition is something we strive to reduce in data.table. The = followed by <- might initially look cumbersome. That allows the following syntactic sugar, though :

DT[,list(
 "Projected return (%)"=      col1<-some_function(y),
 "Investment ($m)"=           col2<-some_other_function(y),
 "Return on Investment ($m)"= col1*col2
 ), by = col]  

where the output can be sent directly to latex or html, for example.

与酒说心事 2024-09-19 06:41:30

我认为这是不可能的,但它应该不会太重要,因为那时它不再是一个聚合函数。例如:

#use summarize() in ddply()
data.means <- ddply(data, .(groups), summarize, mean = mean(x), sd = sd(x), n = length(x))
data.means$se <- data.means$sd / sqrt(data.means$n)
data.means$Upper <- data.means$mean + (data.means$SE * 1.96)
data.means$Lower <- data.means$mean - (data.means$SE * 1.96)

所以我没有直接计算 SE,但在 ddply() 之外计算它也不错。如果你真的愿意,你也可以做

ddply(data, .(groups), summarize, se = sd(x) / sqrt(length(x)))

或者用你的例子来表达

ddply(df, .(col), summarize,
      col1=some_function(y),
      col2=some_other_function(y)
      col3=some_function(y)*some_other_function(y)
    )

I don't think that's possible, but it shouldn't matter too much, because at that point it's not an aggregation function anymore. For example:

#use summarize() in ddply()
data.means <- ddply(data, .(groups), summarize, mean = mean(x), sd = sd(x), n = length(x))
data.means$se <- data.means$sd / sqrt(data.means$n)
data.means$Upper <- data.means$mean + (data.means$SE * 1.96)
data.means$Lower <- data.means$mean - (data.means$SE * 1.96)

So I didn't calculate the SEs directly, but it wasn't so bad calculating it outside of ddply(). If you really wanted to, you could also do

ddply(data, .(groups), summarize, se = sd(x) / sqrt(length(x)))

Or to put it in terms of your example

ddply(df, .(col), summarize,
      col1=some_function(y),
      col2=some_other_function(y)
      col3=some_function(y)*some_other_function(y)
    )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文