plyr summarise 只调用全局函数

发布于 2024-10-05 01:29:38 字数 3386 浏览 1 评论 0原文

我正在尝试将函数(weight.func)传递给调用 ddply 的不同函数(包装器)。我希望 ddply 使用该函数(weight.func)作为其计算的一部分。当weight.func设置为“全局”时,我得到了我想要的输出,但当它作为匿名函数传递给包装器时,我得到了我想要的输出。

我可以让 ddply 做我想做的事吗?这是一个代码示例:

> print(sampleData)
   studentId   problem  part       workerId rating
1       8001 problem26 partA A127R5QI5OGBIK    0.0
2       8001 problem26 partA A1FCLYRBAB430F    0.0
3       8001 problem26 partA A25FZQY34C6RVO    0.0
4       8001 problem26 partA A3G0MO562MHMZ3    0.5
5       8001 problem26 partA A3RB9ZOIUC3NWG    2.0
6       8001 problem26 partB A1FCLYRBAB430F    0.5
7       8001 problem26 partB A1XRDZKSJBWY8Q    0.5
8       8001 problem26 partB A22CRWMZUX7FFR    0.5
9       8001 problem26 partB A25FZQY34C6RVO    1.0
10      8001 problem26 partB A3G0MO562MHMZ3    0.5
11      8001 problem27 partA A1ET309DW6M2XA    2.0
12      8001 problem27 partA A1FCLYRBAB430F    0.0
13      8001 problem27 partA A22CRWMZUX7FFR    0.0
14      8001 problem27 partA A25FZQY34C6RVO    0.0
15      8001 problem27 partA A3G0MO562MHMZ3    0.0
16      8001 problem27 partB A1FCLYRBAB430F    1.0
17      8001 problem27 partB A22CRWMZUX7FFR    0.0
18      8001 problem27 partB A25FZQY34C6RVO    0.0
19      8001 problem27 partB A2U9676210WST5    0.0
20      8001 problem27 partB A3G0MO562MHMZ3    0.0
21      8002 problem26 partA A127R5QI5OGBIK    0.0
22      8002 problem26 partA A1FCLYRBAB430F    0.5
23      8002 problem26 partA A22CRWMZUX7FFR    0.0
24      8002 problem26 partA A25FZQY34C6RVO    2.0
25      8002 problem26 partA A3G0MO562MHMZ3    0.5
26      8002 problem26 partB A17EHJZNJGNRAN    2.0
27      8002 problem26 partB A1FCLYRBAB430F    0.0
28      8002 problem26 partB A2IPRDTE6B4TAB    0.0
29      8002 problem26 partB A3G0MO562MHMZ3    0.0
30      8002 problem26 partB  A6SON3OS15XKA    0.0
31      8002 problem27 partA A1FCLYRBAB430F    0.0
32      8002 problem27 partA A25FZQY34C6RVO    0.0
33      8002 problem27 partA A2IPRDTE6B4TAB    0.0
34      8002 problem27 partA A2U9676210WST5    0.0
35      8002 problem27 partA A3G0MO562MHMZ3    0.0
36      8002 problem27 partB A1FCLYRBAB430F    0.0
37      8002 problem27 partB A1V52SSKROBV8E    2.0
38      8002 problem27 partB A25FZQY34C6RVO    2.0
39      8002 problem27 partB A2IPRDTE6B4TAB    0.0
40      8002 problem27 partB A3G0MO562MHMZ3    0.0
> 
> #Make a wrapper
> wrapper <- function ( ratingData, weight.func ) {
+   print(weight.func) #prove that the function is being passed
+   ddply(ratingData, c('studentId','problem','part'), summarize, 
+           sum.weights = sum ( weight.func(rating)  ))
+ }
> wrapper( sampleData, weight.func=function(x) (x+.001)^-1  )
function(x) (x+.001)^-1
Error in data.frame(sum.weights = sum(weight.func(rating))) : 
  could not find function "weight.func"
> 
> #'globally' declare weight.func
> weight.func <- function(x) (x+.001)^-1
> wrapper( sampleData, weight.func=NULL  )
NULL
  studentId   problem  part sum.weights
1      8001 problem26 partA 3002.495758
2      8001 problem26 partB    8.983033
3      8001 problem27 partA 4000.499750
4      8001 problem27 partB 4000.999001
5      8002 problem26 partA 2004.491766
6      8002 problem26 partB 4000.499750
7      8002 problem27 partA 5000.000000
8      8002 problem27 partB 3000.999500

第二个输出是目标。任何帮助表示赞赏! (包括完成相同任务的非基于 plyr 的方法。)

上面的示例是一个玩具示例。这是我能重现该行为的最简单的情况。

I'm trying to pass a function (weight.func) to a different function (wrapper) that calls ddply. I want ddply to use that function (weight.func) as part of its calculations. I get the output I want when weight.func is set 'globally' but not when it is passes as an anonymous function to the wrapper.

Can I get ddply to do what I want? Here is a code example:

> print(sampleData)
   studentId   problem  part       workerId rating
1       8001 problem26 partA A127R5QI5OGBIK    0.0
2       8001 problem26 partA A1FCLYRBAB430F    0.0
3       8001 problem26 partA A25FZQY34C6RVO    0.0
4       8001 problem26 partA A3G0MO562MHMZ3    0.5
5       8001 problem26 partA A3RB9ZOIUC3NWG    2.0
6       8001 problem26 partB A1FCLYRBAB430F    0.5
7       8001 problem26 partB A1XRDZKSJBWY8Q    0.5
8       8001 problem26 partB A22CRWMZUX7FFR    0.5
9       8001 problem26 partB A25FZQY34C6RVO    1.0
10      8001 problem26 partB A3G0MO562MHMZ3    0.5
11      8001 problem27 partA A1ET309DW6M2XA    2.0
12      8001 problem27 partA A1FCLYRBAB430F    0.0
13      8001 problem27 partA A22CRWMZUX7FFR    0.0
14      8001 problem27 partA A25FZQY34C6RVO    0.0
15      8001 problem27 partA A3G0MO562MHMZ3    0.0
16      8001 problem27 partB A1FCLYRBAB430F    1.0
17      8001 problem27 partB A22CRWMZUX7FFR    0.0
18      8001 problem27 partB A25FZQY34C6RVO    0.0
19      8001 problem27 partB A2U9676210WST5    0.0
20      8001 problem27 partB A3G0MO562MHMZ3    0.0
21      8002 problem26 partA A127R5QI5OGBIK    0.0
22      8002 problem26 partA A1FCLYRBAB430F    0.5
23      8002 problem26 partA A22CRWMZUX7FFR    0.0
24      8002 problem26 partA A25FZQY34C6RVO    2.0
25      8002 problem26 partA A3G0MO562MHMZ3    0.5
26      8002 problem26 partB A17EHJZNJGNRAN    2.0
27      8002 problem26 partB A1FCLYRBAB430F    0.0
28      8002 problem26 partB A2IPRDTE6B4TAB    0.0
29      8002 problem26 partB A3G0MO562MHMZ3    0.0
30      8002 problem26 partB  A6SON3OS15XKA    0.0
31      8002 problem27 partA A1FCLYRBAB430F    0.0
32      8002 problem27 partA A25FZQY34C6RVO    0.0
33      8002 problem27 partA A2IPRDTE6B4TAB    0.0
34      8002 problem27 partA A2U9676210WST5    0.0
35      8002 problem27 partA A3G0MO562MHMZ3    0.0
36      8002 problem27 partB A1FCLYRBAB430F    0.0
37      8002 problem27 partB A1V52SSKROBV8E    2.0
38      8002 problem27 partB A25FZQY34C6RVO    2.0
39      8002 problem27 partB A2IPRDTE6B4TAB    0.0
40      8002 problem27 partB A3G0MO562MHMZ3    0.0
> 
> #Make a wrapper
> wrapper <- function ( ratingData, weight.func ) {
+   print(weight.func) #prove that the function is being passed
+   ddply(ratingData, c('studentId','problem','part'), summarize, 
+           sum.weights = sum ( weight.func(rating)  ))
+ }
> wrapper( sampleData, weight.func=function(x) (x+.001)^-1  )
function(x) (x+.001)^-1
Error in data.frame(sum.weights = sum(weight.func(rating))) : 
  could not find function "weight.func"
> 
> #'globally' declare weight.func
> weight.func <- function(x) (x+.001)^-1
> wrapper( sampleData, weight.func=NULL  )
NULL
  studentId   problem  part sum.weights
1      8001 problem26 partA 3002.495758
2      8001 problem26 partB    8.983033
3      8001 problem27 partA 4000.499750
4      8001 problem27 partB 4000.999001
5      8002 problem26 partA 2004.491766
6      8002 problem26 partB 4000.499750
7      8002 problem27 partA 5000.000000
8      8002 problem27 partB 3000.999500

The second output is the goal. Any help appreciated! (Including a non plyr based way to accomplish the same task.)

The example above is a toy example. It's the simplest case I could get to reproduce the behavior.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

撧情箌佬 2024-10-12 01:29:38

可以使用aggregate:

w2 <- function(d, f){
  aggregate(rating~studentId+problem+part, function(x)sum(f(x)), data=d)
}

w2( sampleData, function(x) (x+.001)^-1  )

注意聚合列的名称是自动确定的,所以如果你想命名那么你需要自己命名。

您可以通过 ddply 执行相同的操作,而无需汇总,

wrapper <- function ( ratingData, weight.func ) {
   ddply(ratingData, c('studentId','problem','part'), function(x)c(sum.weights=sum(weight.func(x$rating))))
 }

wrapper( sampleData, weight.func=function(x) (x+.001)^-1  )

在这种情况下,您可以在函数内指定名称。

you can use aggregate:

w2 <- function(d, f){
  aggregate(rating~studentId+problem+part, function(x)sum(f(x)), data=d)
}

w2( sampleData, function(x) (x+.001)^-1  )

Note that the name of the aggregated column is automatically determined, so if you want to name then you need to do it by yourself.

and you can same thing by ddply without summarize

wrapper <- function ( ratingData, weight.func ) {
   ddply(ratingData, c('studentId','problem','part'), function(x)c(sum.weights=sum(weight.func(x$rating))))
 }

wrapper( sampleData, weight.func=function(x) (x+.001)^-1  )

in this case you can specify the name inside function.

一曲琵琶半遮面シ 2024-10-12 01:29:38

我不太确定我做了什么更改(去掉“sum”后面的空格或将 NULL 更改为真正的函数或 << 某事 >> ),但这现在有效:

wrapper <- function ( ratingData, weight.func=weight.func) {
      ddply(ratingData, .variables=c('studentId','problem','part'),  
            .fun=summarise, sum.weights = sum(weight.func(rating)  ))
  }

wrapper( sampleData, weight.func=weight.func  )
  studentId   problem  part sum.weights
1      8001 problem26 partA 3002.495758
2      8001 problem26 partB    8.983033
3      8001 problem27 partA 4000.499750
4      8001 problem27 partB 4000.999001
5      8002 problem26 partA 2004.491766
6      8002 problem26 partB 4000.499750
7      8002 problem27 partA 5000.000000
8      8002 problem27 partB 3000.999500

I'm not exactly sure which change I made (taking out the spaces after "sum" or changing the NULL to a real function or << something >> ), but this now works:

wrapper <- function ( ratingData, weight.func=weight.func) {
      ddply(ratingData, .variables=c('studentId','problem','part'),  
            .fun=summarise, sum.weights = sum(weight.func(rating)  ))
  }

wrapper( sampleData, weight.func=weight.func  )
  studentId   problem  part sum.weights
1      8001 problem26 partA 3002.495758
2      8001 problem26 partB    8.983033
3      8001 problem27 partA 4000.499750
4      8001 problem27 partB 4000.999001
5      8002 problem26 partA 2004.491766
6      8002 problem26 partB 4000.499750
7      8002 problem27 partA 5000.000000
8      8002 problem27 partB 3000.999500
萌酱 2024-10-12 01:29:38

plyr 中有关此问题的更新 (https://github.com/hadley/plyr/issues/ 3):

使用plyr中的'here'函数,只需将'summarize'替换为'here(summarize)'即可访问调用ddply的环境。

wrapper <- function(ratingData, weight.func){
           ddply(ratingData, c('studentId','problem','part'),
                 here(summarize),  # here(summarize)!
                 sum.weights = sum(weight.func(rating))
                 )
            }

An update on this issue in plyr (https://github.com/hadley/plyr/issues/3):

Use the 'here' function in plyr, just replace 'summarize', with 'here(summarize)' to access the environment where ddply was called from.

wrapper <- function(ratingData, weight.func){
           ddply(ratingData, c('studentId','problem','part'),
                 here(summarize),  # here(summarize)!
                 sum.weights = sum(weight.func(rating))
                 )
            }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文