plyr summarise 只调用全局函数
我正在尝试将函数(weight.func)传递给调用 ddply 的不同函数(包装器)。我希望 ddply 使用该函数(weight.func)作为其计算的一部分。当weight.func设置为“全局”时,我得到了我想要的输出,但当它作为匿名函数传递给包装器时,我得到了我想要的输出。
我可以让 ddply 做我想做的事吗?这是一个代码示例:
> print(sampleData)
studentId problem part workerId rating
1 8001 problem26 partA A127R5QI5OGBIK 0.0
2 8001 problem26 partA A1FCLYRBAB430F 0.0
3 8001 problem26 partA A25FZQY34C6RVO 0.0
4 8001 problem26 partA A3G0MO562MHMZ3 0.5
5 8001 problem26 partA A3RB9ZOIUC3NWG 2.0
6 8001 problem26 partB A1FCLYRBAB430F 0.5
7 8001 problem26 partB A1XRDZKSJBWY8Q 0.5
8 8001 problem26 partB A22CRWMZUX7FFR 0.5
9 8001 problem26 partB A25FZQY34C6RVO 1.0
10 8001 problem26 partB A3G0MO562MHMZ3 0.5
11 8001 problem27 partA A1ET309DW6M2XA 2.0
12 8001 problem27 partA A1FCLYRBAB430F 0.0
13 8001 problem27 partA A22CRWMZUX7FFR 0.0
14 8001 problem27 partA A25FZQY34C6RVO 0.0
15 8001 problem27 partA A3G0MO562MHMZ3 0.0
16 8001 problem27 partB A1FCLYRBAB430F 1.0
17 8001 problem27 partB A22CRWMZUX7FFR 0.0
18 8001 problem27 partB A25FZQY34C6RVO 0.0
19 8001 problem27 partB A2U9676210WST5 0.0
20 8001 problem27 partB A3G0MO562MHMZ3 0.0
21 8002 problem26 partA A127R5QI5OGBIK 0.0
22 8002 problem26 partA A1FCLYRBAB430F 0.5
23 8002 problem26 partA A22CRWMZUX7FFR 0.0
24 8002 problem26 partA A25FZQY34C6RVO 2.0
25 8002 problem26 partA A3G0MO562MHMZ3 0.5
26 8002 problem26 partB A17EHJZNJGNRAN 2.0
27 8002 problem26 partB A1FCLYRBAB430F 0.0
28 8002 problem26 partB A2IPRDTE6B4TAB 0.0
29 8002 problem26 partB A3G0MO562MHMZ3 0.0
30 8002 problem26 partB A6SON3OS15XKA 0.0
31 8002 problem27 partA A1FCLYRBAB430F 0.0
32 8002 problem27 partA A25FZQY34C6RVO 0.0
33 8002 problem27 partA A2IPRDTE6B4TAB 0.0
34 8002 problem27 partA A2U9676210WST5 0.0
35 8002 problem27 partA A3G0MO562MHMZ3 0.0
36 8002 problem27 partB A1FCLYRBAB430F 0.0
37 8002 problem27 partB A1V52SSKROBV8E 2.0
38 8002 problem27 partB A25FZQY34C6RVO 2.0
39 8002 problem27 partB A2IPRDTE6B4TAB 0.0
40 8002 problem27 partB A3G0MO562MHMZ3 0.0
>
> #Make a wrapper
> wrapper <- function ( ratingData, weight.func ) {
+ print(weight.func) #prove that the function is being passed
+ ddply(ratingData, c('studentId','problem','part'), summarize,
+ sum.weights = sum ( weight.func(rating) ))
+ }
> wrapper( sampleData, weight.func=function(x) (x+.001)^-1 )
function(x) (x+.001)^-1
Error in data.frame(sum.weights = sum(weight.func(rating))) :
could not find function "weight.func"
>
> #'globally' declare weight.func
> weight.func <- function(x) (x+.001)^-1
> wrapper( sampleData, weight.func=NULL )
NULL
studentId problem part sum.weights
1 8001 problem26 partA 3002.495758
2 8001 problem26 partB 8.983033
3 8001 problem27 partA 4000.499750
4 8001 problem27 partB 4000.999001
5 8002 problem26 partA 2004.491766
6 8002 problem26 partB 4000.499750
7 8002 problem27 partA 5000.000000
8 8002 problem27 partB 3000.999500
第二个输出是目标。任何帮助表示赞赏! (包括完成相同任务的非基于 plyr 的方法。)
上面的示例是一个玩具示例。这是我能重现该行为的最简单的情况。
I'm trying to pass a function (weight.func) to a different function (wrapper) that calls ddply. I want ddply to use that function (weight.func) as part of its calculations. I get the output I want when weight.func is set 'globally' but not when it is passes as an anonymous function to the wrapper.
Can I get ddply to do what I want? Here is a code example:
> print(sampleData)
studentId problem part workerId rating
1 8001 problem26 partA A127R5QI5OGBIK 0.0
2 8001 problem26 partA A1FCLYRBAB430F 0.0
3 8001 problem26 partA A25FZQY34C6RVO 0.0
4 8001 problem26 partA A3G0MO562MHMZ3 0.5
5 8001 problem26 partA A3RB9ZOIUC3NWG 2.0
6 8001 problem26 partB A1FCLYRBAB430F 0.5
7 8001 problem26 partB A1XRDZKSJBWY8Q 0.5
8 8001 problem26 partB A22CRWMZUX7FFR 0.5
9 8001 problem26 partB A25FZQY34C6RVO 1.0
10 8001 problem26 partB A3G0MO562MHMZ3 0.5
11 8001 problem27 partA A1ET309DW6M2XA 2.0
12 8001 problem27 partA A1FCLYRBAB430F 0.0
13 8001 problem27 partA A22CRWMZUX7FFR 0.0
14 8001 problem27 partA A25FZQY34C6RVO 0.0
15 8001 problem27 partA A3G0MO562MHMZ3 0.0
16 8001 problem27 partB A1FCLYRBAB430F 1.0
17 8001 problem27 partB A22CRWMZUX7FFR 0.0
18 8001 problem27 partB A25FZQY34C6RVO 0.0
19 8001 problem27 partB A2U9676210WST5 0.0
20 8001 problem27 partB A3G0MO562MHMZ3 0.0
21 8002 problem26 partA A127R5QI5OGBIK 0.0
22 8002 problem26 partA A1FCLYRBAB430F 0.5
23 8002 problem26 partA A22CRWMZUX7FFR 0.0
24 8002 problem26 partA A25FZQY34C6RVO 2.0
25 8002 problem26 partA A3G0MO562MHMZ3 0.5
26 8002 problem26 partB A17EHJZNJGNRAN 2.0
27 8002 problem26 partB A1FCLYRBAB430F 0.0
28 8002 problem26 partB A2IPRDTE6B4TAB 0.0
29 8002 problem26 partB A3G0MO562MHMZ3 0.0
30 8002 problem26 partB A6SON3OS15XKA 0.0
31 8002 problem27 partA A1FCLYRBAB430F 0.0
32 8002 problem27 partA A25FZQY34C6RVO 0.0
33 8002 problem27 partA A2IPRDTE6B4TAB 0.0
34 8002 problem27 partA A2U9676210WST5 0.0
35 8002 problem27 partA A3G0MO562MHMZ3 0.0
36 8002 problem27 partB A1FCLYRBAB430F 0.0
37 8002 problem27 partB A1V52SSKROBV8E 2.0
38 8002 problem27 partB A25FZQY34C6RVO 2.0
39 8002 problem27 partB A2IPRDTE6B4TAB 0.0
40 8002 problem27 partB A3G0MO562MHMZ3 0.0
>
> #Make a wrapper
> wrapper <- function ( ratingData, weight.func ) {
+ print(weight.func) #prove that the function is being passed
+ ddply(ratingData, c('studentId','problem','part'), summarize,
+ sum.weights = sum ( weight.func(rating) ))
+ }
> wrapper( sampleData, weight.func=function(x) (x+.001)^-1 )
function(x) (x+.001)^-1
Error in data.frame(sum.weights = sum(weight.func(rating))) :
could not find function "weight.func"
>
> #'globally' declare weight.func
> weight.func <- function(x) (x+.001)^-1
> wrapper( sampleData, weight.func=NULL )
NULL
studentId problem part sum.weights
1 8001 problem26 partA 3002.495758
2 8001 problem26 partB 8.983033
3 8001 problem27 partA 4000.499750
4 8001 problem27 partB 4000.999001
5 8002 problem26 partA 2004.491766
6 8002 problem26 partB 4000.499750
7 8002 problem27 partA 5000.000000
8 8002 problem27 partB 3000.999500
The second output is the goal. Any help appreciated! (Including a non plyr based way to accomplish the same task.)
The example above is a toy example. It's the simplest case I could get to reproduce the behavior.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
可以使用aggregate:
注意聚合列的名称是自动确定的,所以如果你想命名那么你需要自己命名。
您可以通过 ddply 执行相同的操作,而无需汇总,
在这种情况下,您可以在函数内指定名称。
you can use aggregate:
Note that the name of the aggregated column is automatically determined, so if you want to name then you need to do it by yourself.
and you can same thing by ddply without summarize
in this case you can specify the name inside function.
这是 plyr 中的一个已知错误: https://github.com/hadley/plyr/issues #问题/3
This is a known bug in plyr: https://github.com/hadley/plyr/issues#issue/3
我不太确定我做了什么更改(去掉“sum”后面的空格或将 NULL 更改为真正的函数或 << 某事 >> ),但这现在有效:
I'm not exactly sure which change I made (taking out the spaces after "sum" or changing the NULL to a real function or << something >> ), but this now works:
plyr 中有关此问题的更新 (https://github.com/hadley/plyr/issues/ 3):
使用plyr中的'here'函数,只需将'summarize'替换为'here(summarize)'即可访问调用ddply的环境。
An update on this issue in plyr (https://github.com/hadley/plyr/issues/3):
Use the 'here' function in plyr, just replace 'summarize', with 'here(summarize)' to access the environment where ddply was called from.