我如何重写这段代码,以便它按预期使用 plyr/ddply?
背景
我有一个概率分布数据框,我想计算其统计摘要:
priors <- structure(list(name = c("theta1", "theta2", "theta3", "theta4",
"theta5"), distn = c("gamma", "beta", "lnorm", "weibull", "gamma"),
parama = c(2.68, 4, 1.35, 1.7, 2.3), paramb = c(0.084, 7.2, 0.69, 0.66, 3.9),
another_col = structure(c(3L, 4L, 5L, 1L, 2L
), .Label = c("1", "2", "a", "b", "c"), class = "factor")),
.Names = c("name", "distn", "parama", "paramb", "another_col"), row.names = c("1",
"2", "3", "4", "5"), class = "data.frame")
方法
第1步:我编写了一个函数来计算摘要并返回平均值(lcl, ucl)
summary.stats <- function(distn, A, B) {
if (distn == 'gamma' ) ans <- c(A*B, qgamma(c(0.05, 0.95), A[ ], B))
if (distn == 'lnorm' ) ans <- c(exp(A + 1/2 * B^2), qlnorm(c(0.05, 0.95), A, B))
if (distn == 'beta' ) ans <- c(A/(A+B), qbeta( c(0.05, 0.95), A, B))
if (distn == 'weibull') ans <- c(mean(rweibull(10000,A,B)), qweibull(c(0.05, 0.95), A, B))
if (distn == 'norm' ) ans <- c(A, qnorm( c(0.05, 0.95), A, B))
ans <- (signif(ans, 2))
return(paste(ans[1], ' (', ans[2], ', ', ans[3],')', sep = ''))
}
第 2 步:我想向我的数据框中添加一个名为 stats
的新列
priors$stats <- ddply(priors,
.(name, distn, parama, paramb),
function(x) summary.stats(x$distn, x$parama, x$paramb))$V1
问题 1:
执行此操作的正确方法是什么? 时出现错误
ddply(priors,
.(name, distn, parama, paramb),
transform,
stats = function(x) summary.stats(x$distn, x$parama, x$paramb))
当我尝试问题 2(额外加分)
:是否有更有效的方法来编写 summary.stats
函数,即使用更少的“if”?
更新
感谢 Shane 和 Joshua 为我解决了这个问题。
我还发现了一个问题,对于其他尝试 对数据帧的每一行执行 plyr 操作
Background
I have a dataframe of probability distributions that I would like to calculate statistical summaries for:
priors <- structure(list(name = c("theta1", "theta2", "theta3", "theta4",
"theta5"), distn = c("gamma", "beta", "lnorm", "weibull", "gamma"),
parama = c(2.68, 4, 1.35, 1.7, 2.3), paramb = c(0.084, 7.2, 0.69, 0.66, 3.9),
another_col = structure(c(3L, 4L, 5L, 1L, 2L
), .Label = c("1", "2", "a", "b", "c"), class = "factor")),
.Names = c("name", "distn", "parama", "paramb", "another_col"), row.names = c("1",
"2", "3", "4", "5"), class = "data.frame")
Approach
Step 1: I wrote a function to calculate the summaries and returning mean(lcl, ucl)
summary.stats <- function(distn, A, B) {
if (distn == 'gamma' ) ans <- c(A*B, qgamma(c(0.05, 0.95), A[ ], B))
if (distn == 'lnorm' ) ans <- c(exp(A + 1/2 * B^2), qlnorm(c(0.05, 0.95), A, B))
if (distn == 'beta' ) ans <- c(A/(A+B), qbeta( c(0.05, 0.95), A, B))
if (distn == 'weibull') ans <- c(mean(rweibull(10000,A,B)), qweibull(c(0.05, 0.95), A, B))
if (distn == 'norm' ) ans <- c(A, qnorm( c(0.05, 0.95), A, B))
ans <- (signif(ans, 2))
return(paste(ans[1], ' (', ans[2], ', ', ans[3],')', sep = ''))
}
Step 2: I would like to add a new column to my dataframe called stats
priors$stats <- ddply(priors,
.(name, distn, parama, paramb),
function(x) summary.stats(x$distn, x$parama, x$paramb))$V1
Question 1:
what is the proper way to do this? I get an error when I try
ddply(priors,
.(name, distn, parama, paramb),
transform,
stats = function(x) summary.stats(x$distn, x$parama, x$paramb))
Question 2: (extra credit)
Is there a more efficient way to code the summary.stats
function, i.e., with less 'if's'?
update
Thanks to Shane and Joshua for clearing this up for me.
I also found a question that should be helpful for others trying to do a plyr operation on every row of a dataframe
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是
summary.stats
的清理版本,它使用switch
代替。我还在输出中添加了名称“stats”,因为这似乎是让您绊倒的原因。我不确定如何使用
plyr
执行此操作,但您可以使用无聊的 ol'sapply
执行此操作,如下所示:Here's a cleaned-up version of your
summary.stats
that usesswitch
instead. I also added the name "stats" to the output, since that seems to be the thing tripping you up.I'm not sure how to do this with
plyr
, but you can do it with boring ol'sapply
like this:我可能会遗漏一些东西,但使用乔希的函数和你的数据,这工作得很好。
你希望你的输出是什么样的?
编辑
抱歉,没有阅读您的完整评论。那么这应该可以工作(在我的示例中,我省略了一列):
I could be missing something, but using Josh's function and your data, this works fine.
What do you want your output to look like?
Edit
Sorry, didn't read your whole comment. Then this should work (in my example here, I leave out one column):