我如何重写这段代码，以便它按预期使用 plyr/ddply？

发布于 2024-10-06 13:59:30 字数 2150 浏览 10 评论 0原文

背景

我有一个概率分布数据框，我想计算其统计摘要：

priors <- structure(list(name = c("theta1", "theta2", "theta3", "theta4", 
  "theta5"), distn = c("gamma", "beta", "lnorm", "weibull", "gamma"), 
   parama = c(2.68, 4, 1.35, 1.7, 2.3), paramb = c(0.084, 7.2, 0.69, 0.66, 3.9),
   another_col = structure(c(3L, 4L, 5L, 1L, 2L
   ), .Label = c("1", "2", "a", "b", "c"), class = "factor")), 
   .Names = c("name", "distn", "parama", "paramb", "another_col"), row.names = c("1",
   "2", "3", "4", "5"), class = "data.frame")

方法

第1步：我编写了一个函数来计算摘要并返回平均值(lcl, ucl)

 summary.stats <- function(distn, A, B) {
  if (distn == 'gamma'  ) ans <- c(A*B,                       qgamma(c(0.05, 0.95), A[ ], B))
  if (distn == 'lnorm'  ) ans <- c(exp(A + 1/2 * B^2),        qlnorm(c(0.05, 0.95), A, B))
  if (distn == 'beta'   ) ans <- c(A/(A+B),                   qbeta( c(0.05, 0.95), A, B))
  if (distn == 'weibull') ans <- c(mean(rweibull(10000,A,B)), qweibull(c(0.05, 0.95), A, B))
  if (distn == 'norm'   ) ans <- c(A,                         qnorm( c(0.05, 0.95), A, B))
  ans <- (signif(ans, 2))
  return(paste(ans[1], ' (', ans[2], ', ', ans[3],')', sep = ''))
}

第 2 步：我想向我的数据框中添加一个名为 stats 的新列

priors$stats <- ddply(priors, 
                     .(name, distn, parama, paramb), 
                     function(x)  summary.stats(x$distn, x$parama, x$paramb))$V1

问题 1：

执行此操作的正确方法是什么？时出现错误

                ddply(priors, 
                     .(name, distn, parama, paramb),
                     transform, 
                     stats = function(x)  summary.stats(x$distn, x$parama, x$paramb))

当我尝试问题 2（额外加分）

：是否有更有效的方法来编写 summary.stats 函数，即使用更少的“if”？

更新

感谢 Shane 和 Joshua 为我解决了这个问题。

我还发现了一个问题，对于其他尝试对数据帧的每一行执行 plyr 操作

原文

Background

I have a dataframe of probability distributions that I would like to calculate statistical summaries for:

priors <- structure(list(name = c("theta1", "theta2", "theta3", "theta4", 
  "theta5"), distn = c("gamma", "beta", "lnorm", "weibull", "gamma"), 
   parama = c(2.68, 4, 1.35, 1.7, 2.3), paramb = c(0.084, 7.2, 0.69, 0.66, 3.9),
   another_col = structure(c(3L, 4L, 5L, 1L, 2L
   ), .Label = c("1", "2", "a", "b", "c"), class = "factor")), 
   .Names = c("name", "distn", "parama", "paramb", "another_col"), row.names = c("1",
   "2", "3", "4", "5"), class = "data.frame")

Approach

Step 1: I wrote a function to calculate the summaries and returning mean(lcl, ucl)

 summary.stats <- function(distn, A, B) {
  if (distn == 'gamma'  ) ans <- c(A*B,                       qgamma(c(0.05, 0.95), A[ ], B))
  if (distn == 'lnorm'  ) ans <- c(exp(A + 1/2 * B^2),        qlnorm(c(0.05, 0.95), A, B))
  if (distn == 'beta'   ) ans <- c(A/(A+B),                   qbeta( c(0.05, 0.95), A, B))
  if (distn == 'weibull') ans <- c(mean(rweibull(10000,A,B)), qweibull(c(0.05, 0.95), A, B))
  if (distn == 'norm'   ) ans <- c(A,                         qnorm( c(0.05, 0.95), A, B))
  ans <- (signif(ans, 2))
  return(paste(ans[1], ' (', ans[2], ', ', ans[3],')', sep = ''))
}

Step 2: I would like to add a new column to my dataframe called stats

priors$stats <- ddply(priors, 
                     .(name, distn, parama, paramb), 
                     function(x)  summary.stats(x$distn, x$parama, x$paramb))$V1

Question 1:

what is the proper way to do this? I get an error when I try

                ddply(priors, 
                     .(name, distn, parama, paramb),
                     transform, 
                     stats = function(x)  summary.stats(x$distn, x$parama, x$paramb))

Question 2: (extra credit)

Is there a more efficient way to code the summary.stats function, i.e., with less 'if's'?

update

Thanks to Shane and Joshua for clearing this up for me.

I also found a question that should be helpful for others trying to do a plyr operation on every row of a dataframe

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不忘初心 2024-10-13 13:59:30

这是 summary.stats 的清理版本，它使用 switch 代替。我还在输出中添加了名称“stats”，因为这似乎是让您绊倒的原因。

summaryStats <- function(distn, A, B) {
  CI <- c(0.05, 0.95)
  FUN <- get(paste("q",distn,sep=""))
  ans <- switch(distn,
    gamma   = A*B,
    lnorm   = exp(A + 1/2 * B^2),
    beta    = A/(A+B),
    weibull = mean(rweibull(10000,A,B)),
    norm    = A)
  ans <- c(ans, FUN(CI, A, B))
  ans <- (signif(ans, 2))
  out <- c(stats=paste(ans[1], ' (', ans[2], ', ', ans[3],')', sep=''))
  return(out)
}

我不确定如何使用 plyr 执行此操作，但您可以使用无聊的 ol' sapply 执行此操作，如下所示：

priors$stats <- sapply(1:nrow(priors),
  function(i) with(priors[i,], summaryStats(distn, parama, paramb) ))

Here's a cleaned-up version of your summary.stats that uses switch instead. I also added the name "stats" to the output, since that seems to be the thing tripping you up.

summaryStats <- function(distn, A, B) {
  CI <- c(0.05, 0.95)
  FUN <- get(paste("q",distn,sep=""))
  ans <- switch(distn,
    gamma   = A*B,
    lnorm   = exp(A + 1/2 * B^2),
    beta    = A/(A+B),
    weibull = mean(rweibull(10000,A,B)),
    norm    = A)
  ans <- c(ans, FUN(CI, A, B))
  ans <- (signif(ans, 2))
  out <- c(stats=paste(ans[1], ' (', ans[2], ', ', ans[3],')', sep=''))
  return(out)
}

I'm not sure how to do this with plyr, but you can do it with boring ol' sapply like this:

priors$stats <- sapply(1:nrow(priors),
  function(i) with(priors[i,], summaryStats(distn, parama, paramb) ))

回复收藏 0 原文

伴随着你 2024-10-13 13:59:30

我可能会遗漏一些东西，但使用乔希的函数和你的数据，这工作得很好。

priors <- ddply(priors, 
  .(name, distn, parama, paramb), 
  function(x)  summaryStats(x$distn, x$parama, x$paramb))
colnames(priors)[5] <- "stats"

你希望你的输出是什么样的？

> priors
    name   distn parama paramb            stats
1 theta1   gamma   2.68  0.084   0.23 (7.8, 69)
2 theta2    beta   4.00  7.200 0.36 (0.15, 0.6)
3 theta3   lnorm   1.35  0.690    4.9 (1.2, 12)
4 theta4 weibull   1.70  0.660 0.59 (0.12, 1.3)
5 theta5   gamma   2.30  3.900    9 (0.12, 1.3)

编辑

抱歉，没有阅读您的完整评论。那么这应该可以工作（在我的示例中，我省略了一列）：

ddply(priors, .(distn, parama, paramb), function(x) 
   data.frame(x, stats=summaryStats(x$distn, x$parama, x$paramb)))

I could be missing something, but using Josh's function and your data, this works fine.

priors <- ddply(priors, 
  .(name, distn, parama, paramb), 
  function(x)  summaryStats(x$distn, x$parama, x$paramb))
colnames(priors)[5] <- "stats"

What do you want your output to look like?

> priors
    name   distn parama paramb            stats
1 theta1   gamma   2.68  0.084   0.23 (7.8, 69)
2 theta2    beta   4.00  7.200 0.36 (0.15, 0.6)
3 theta3   lnorm   1.35  0.690    4.9 (1.2, 12)
4 theta4 weibull   1.70  0.660 0.59 (0.12, 1.3)
5 theta5   gamma   2.30  3.900    9 (0.12, 1.3)

Edit

Sorry, didn't read your whole comment. Then this should work (in my example here, I leave out one column):

ddply(priors, .(distn, parama, paramb), function(x) 
   data.frame(x, stats=summaryStats(x$distn, x$parama, x$paramb)))

回复收藏 0 原文

~没有更多了~

关于作者

顾挽

暂无简介

文章

761 人气

关注发私信

若沐

文章 0 评论 0

关注

Sherlocked

文章 0 评论 0

关注

mb_UOquntnT

文章 0 评论 0

关注

你怎么敢

文章 0 评论 0

关注

迷乱花海

文章 0 评论 0

关注

茶叶先生

文章 0 评论 0

友情链接

文江博客

我如何重写这段代码，以便它按预期使用 plyr/ddply？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

若沐

Sherlocked

mb_UOquntnT

你怎么敢

迷乱花海

茶叶先生

友情链接

我如何重写这段代码，以便它按预期使用 plyr/ddply？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

若沐

Sherlocked

mb_UOquntnT

你怎么敢

迷乱花海

茶叶先生

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。