如何从 plyr 输出中进行总结而不是长输出
我喜欢 plyr 将数据帧拆分为多个数据集,然后对每个数据集执行相同操作的能力。最好的部分是当它以整洁紧凑、标记良好的表格形式向您显示结果时。我喜欢使用each() 将一堆计算放入一行中。但是,我不明白为什么在 ddply 参数中使用汇总函数会破坏输出并使其输出很长且未标记。看看这里就明白我的意思了。你能告诉我我做错了什么吗?我更喜欢用总结的方式。
让我们首先设置一个示例数据框。假设您有 60 名参与者参与一项研究。其中 20 个很有趣,20 个很聪明,20 个很友善。然后每个科目都会得到一个分数。
type<-rep(c("funny","clever", "nice"),20)
score<-rnorm(60)+10
data<-data.frame(type,score)
现在我想要一个表格,显示 3 种类型的人的平均分、中位数分、最低分和最高分
ddply(data,.(type), summarise, each(mean,median,min,max)(score))
上面的行应该给出一个不错的表格(3 行 - 每种类型 1 行,以及 4 列数据) 。唉,它给出了一个完整的长表,只有一列数字,而且没有一个被标记。
ddply(data,.(type), function(jjkk) each(mean,median,min,max)(jjkk$score))
上面的行给了我我想要的。你能解释一下我对 ddply 语法的不理解之处吗?
I love the ability of plyr to split a data frame into multiple data sets and then perform identical operations on each set. The best part is when it shows you the result as a neat compact well labeled table. I love throwing a bunch of calculations into a single line using each(). However, I do not understand why using the summarise function in the ddply argument scuttles the output and makes it come out long and unlabeled. Have a look here to see what I mean. Can you tell me what I am doing wrong? I prefer to use summarise.
Let us first set up an example data frame. Imagine that you had 60 participants in a study. 20 of them were funny, 20 were clever and 20 were nice. Then each subject received a score.
type<-rep(c("funny","clever", "nice"),20)
score<-rnorm(60)+10
data<-data.frame(type,score)
Now I want a table showing the mean score, median score, minimum score and maximum score for each of the 3 types of people
ddply(data,.(type), summarise, each(mean,median,min,max)(score))
The line above should have given a nice table (3 rows - 1 for each type, and 4 columns of data). Alas it gives a whole long table with only one column of numbers, none of which are labeled.
ddply(data,.(type), function(jjkk) each(mean,median,min,max)(jjkk$score))
The above line gives me what I want. Can you explain what I am not understanding about the syntax of ddply.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
拼写出函数,如下所示:
产生您想要的格式的输出。
我认为你的问题是
each()
返回一个向量,而summarize()
并没有真正按照你想要的方式处理。Spelling out the functions, as in:
produces output in the format you desired.
I think your problem is that
each()
is returning a vector, whichsummarize()
isn't really handling in the way you intend it to.嗯……我太累了,无法思考一句台词,但
reshape
可以解决问题:Hmmm... I'm too tired to think about a one-liner, but
reshape
will do the trick: