如何从 plyr 输出中进行总结而不是长输出

发布于 2024-10-21 10:17:00 字数 758 浏览 1 评论 0原文

我喜欢 plyr 将数据帧拆分为多个数据集,然后对每个数据集执行相同操作的能力。最好的部分是当它以整洁紧凑、标记良好的表格形式向您显示结果时。我喜欢使用each() 将一堆计算放入一行中。但是,我不明白为什么在 ddply 参数中使用汇总函数会破坏输出并使其输出很长且未标记。看看这里就明白我的意思了。你能告诉我我做错了什么吗?我更喜欢用总结的方式。

让我们首先设置一个示例数据框。假设您有 60 名参与者参与一项研究。其中 20 个很有趣,20 个很聪明,20 个很友善。然后每个科目都会得到一个分数。

type<-rep(c("funny","clever", "nice"),20)
score<-rnorm(60)+10
data<-data.frame(type,score)

现在我想要一个表格,显示 3 种类型的人的平均分、中位数分、最低分和最高分

ddply(data,.(type), summarise, each(mean,median,min,max)(score))

上面的行应该给出一个不错的表格(3 行 - 每种类型 1 行,以及 4 列数据) 。唉,它给出了一个完整的长表,只有一列数字,而且没有一个被标记。

ddply(data,.(type), function(jjkk) each(mean,median,min,max)(jjkk$score))

上面的行给了我我想要的。你能解释一下我对 ddply 语法的不理解之处吗?

I love the ability of plyr to split a data frame into multiple data sets and then perform identical operations on each set. The best part is when it shows you the result as a neat compact well labeled table. I love throwing a bunch of calculations into a single line using each(). However, I do not understand why using the summarise function in the ddply argument scuttles the output and makes it come out long and unlabeled. Have a look here to see what I mean. Can you tell me what I am doing wrong? I prefer to use summarise.

Let us first set up an example data frame. Imagine that you had 60 participants in a study. 20 of them were funny, 20 were clever and 20 were nice. Then each subject received a score.

type<-rep(c("funny","clever", "nice"),20)
score<-rnorm(60)+10
data<-data.frame(type,score)

Now I want a table showing the mean score, median score, minimum score and maximum score for each of the 3 types of people

ddply(data,.(type), summarise, each(mean,median,min,max)(score))

The line above should have given a nice table (3 rows - 1 for each type, and 4 columns of data). Alas it gives a whole long table with only one column of numbers, none of which are labeled.

ddply(data,.(type), function(jjkk) each(mean,median,min,max)(jjkk$score))

The above line gives me what I want. Can you explain what I am not understanding about the syntax of ddply.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

故事与诗 2024-10-28 10:17:00

拼写出函数,如下所示:

ddply(data,"type", summarise, mean=mean(score),median=median(score),max=max(score),min=min(score))

产生您想要的格式的输出。

我认为你的问题是 each() 返回一个向量,而 summarize() 并没有真正按照你想要的方式处理。

Spelling out the functions, as in:

ddply(data,"type", summarise, mean=mean(score),median=median(score),max=max(score),min=min(score))

produces output in the format you desired.

I think your problem is that each() is returning a vector, which summarize() isn't really handling in the way you intend it to.

脱离于你 2024-10-28 10:17:00

嗯……我太累了,无法思考一句台词,但 reshape 可以解决问题:

library(reshape)
library(plyr)
mdtf <- melt(data)
cast(mdtf, type ~ ., each(min, max, mean, median))
    type      min      max      mean   median
1 clever 7.808648 12.08930 10.125563 10.27269
2  funny 8.302777 12.04066  9.941331 10.07333
3   nice 8.442508 11.80132 10.085667 10.07261

Hmmm... I'm too tired to think about a one-liner, but reshape will do the trick:

library(reshape)
library(plyr)
mdtf <- melt(data)
cast(mdtf, type ~ ., each(min, max, mean, median))
    type      min      max      mean   median
1 clever 7.808648 12.08930 10.125563 10.27269
2  funny 8.302777 12.04066  9.941331 10.07333
3   nice 8.442508 11.80132 10.085667 10.07261
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文