R 箱线图总结
从下面代表用户在三个选项之间进行选择的(简化的)数据中,我想创建一组基于值因子的用户选择值的次数百分比的箱线图。所以我想要三个箱线图,用户选择 0、1 和 2 的百分比。
我确信我错过了一些明显的东西,就像我经常在 R 中所做的那样。我可以使用 by(dat, dat$user , function(user) {table(user$value)/length(user$value)*100})
,但不知道如何将其转换为箱线图。
希望这是有道理的。
user|value
1|2
1|1
1|0
1|2
1|0
2|2
2|2
2|2
2|0
2|2
3|2
3|0
3|1
3|0
3|1
4|2
4|0
4|1
4|0
4|1
5|2
5|0
5|1
5|0
5|1
6|2
6|0
6|0
6|1
6|2
7|0
7|0
7|1
7|0
7|1
8|2
8|2
8|1
8|1
8|2
9|1
9|0
9|0
9|0
9|0
10|1
10|2
10|0
10|2
10|1
From the (simplified) data below that represents a user choosing between three options, I want to create a set of boxplots of the percentage of times a user chose a value, based upon the factor of value. So I want three boxplots, the percentage users chose 0, 1 and 2.
I'm sure I'm missing something obvious, as I often do with R. I can get the percentages using by(dat, dat$user, function(user) {table(user$value)/length(user$value)*100})
, but don't know how to turn that into boxplots.
Hope that makes sense.
user|value
1|2
1|1
1|0
1|2
1|0
2|2
2|2
2|2
2|0
2|2
3|2
3|0
3|1
3|0
3|1
4|2
4|0
4|1
4|0
4|1
5|2
5|0
5|1
5|0
5|1
6|2
6|0
6|0
6|1
6|2
7|0
7|0
7|1
7|0
7|1
8|2
8|2
8|1
8|1
8|2
9|1
9|0
9|0
9|0
9|0
10|1
10|2
10|0
10|2
10|1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会使用
plyr
包创建摘要。首先,您应该将value
转换为一个因子,这样当某些用户从未选择某个值时,该值将具有 0%。现在,您编写带有数据帧的汇总函数(从技术上讲,此步骤可以合并到下一步中,但这样更清晰)。
然后,将此函数应用于
user
定义的每个dat
子集。该数据的基本图形箱线图将像这样完成。
如果您不介意我的两分钱,我不知道箱线图是处理此类数据的正确方法。这不是非常密集的数据(如果您的样本是现实的),并且箱线图不会捕获决策之间的依赖关系。也就是说,如果某个用户非常频繁地选择
1
,那么他们选择另一个的频率一定要低得多。您可以尝试为每个用户绘制一个填充条形图,如果您使用 ggplot2,则不需要任何预先汇总。代码看起来像这样
I would approach creating the summary using the
plyr
package. First, you should convertvalue
to a factor, so that when some user never picked some value, that value will have 0%.Now, you write your summary function that takes a data frame (technically this step can be smushed into the next step, but this way it's more legible).
Then, apply this function to every subset of
dat
defined byuser
.A base graphics boxplot of this data would be done like this.
If you don't mind my two cents, I don't know that boxplots are the right way to go with this kind of data. This isn't very dense data (if your sample is realistic), and boxplots don't capture the dependency between decisions. That is, if some user chose
1
super frequently, then they must have chosen the other much less frequently.You could try a filled bar chart for each user, and it wouldn't require any pre-summarization if you use
ggplot2
. The code would look like this您正在寻找这样的东西吗?
Is something like this what you're looking for?