R 箱线图总结

发布于 2024-09-18 03:49:01 字数 504 浏览 8 评论 0原文

从下面代表用户在三个选项之间进行选择的（简化的）数据中，我想创建一组基于值因子的用户选择值的次数百分比的箱线图。所以我想要三个箱线图，用户选择 0、1 和 2 的百分比。

我确信我错过了一些明显的东西，就像我经常在 R 中所做的那样。我可以使用 by(dat, dat$user ， function(user) {table(user$value)/length(user$value)*100})，但不知道如何将其转换为箱线图。

希望这是有道理的。

user|value
1|2
1|1
1|0
1|2
1|0
2|2
2|2
2|2
2|0
2|2
3|2
3|0
3|1
3|0
3|1
4|2
4|0
4|1
4|0
4|1
5|2
5|0
5|1
5|0
5|1
6|2
6|0
6|0
6|1
6|2
7|0
7|0
7|1
7|0
7|1
8|2
8|2
8|1
8|1
8|2
9|1
9|0
9|0
9|0
9|0
10|1
10|2
10|0
10|2
10|1

原文

From the (simplified) data below that represents a user choosing between three options, I want to create a set of boxplots of the percentage of times a user chose a value, based upon the factor of value. So I want three boxplots, the percentage users chose 0, 1 and 2.

I'm sure I'm missing something obvious, as I often do with R. I can get the percentages using by(dat, dat$user, function(user) {table(user$value)/length(user$value)*100}), but don't know how to turn that into boxplots.

Hope that makes sense.

user|value
1|2
1|1
1|0
1|2
1|0
2|2
2|2
2|2
2|0
2|2
3|2
3|0
3|1
3|0
3|1
4|2
4|0
4|1
4|0
4|1
5|2
5|0
5|1
5|0
5|1
6|2
6|0
6|0
6|1
6|2
7|0
7|0
7|1
7|0
7|1
8|2
8|2
8|1
8|1
8|2
9|1
9|0
9|0
9|0
9|0
10|1
10|2
10|0
10|2
10|1

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

掩于岁月 2024-09-25 03:49:01

我会使用 plyr 包创建摘要。首先，您应该将 value 转换为一个因子，这样当某些用户从未选择某个值时，该值将具有 0%。

dat$value <- factor(dat$value)

现在，您编写带有数据帧的汇总函数（从技术上讲，此步骤可以合并到下一步中，但这样更清晰）。

p.by.user <- function(df){
  data.frame(prop.table(table(df$value)))
}

然后，将此函数应用于 user 定义的每个 dat 子集。

dat.summary <- ddply(dat, .(user), p.by.user)

该数据的基本图形箱线图将像这样完成。

with(dat.summary, boxplot(Freq ~ Var1, ylim = c(0,1)))

如果您不介意我的两分钱，我不知道箱线图是处理此类数据的正确方法。这不是非常密集的数据（如果您的样本是现实的），并且箱线图不会捕获决策之间的依赖关系。也就是说，如果某个用户非常频繁地选择1，那么他们选择另一个的频率一定要低得多。

您可以尝试为每个用户绘制一个填充条形图，如果您使用 ggplot2，则不需要任何预先汇总。代码看起来像这样

ggplot(dat, aes(factor(user), fill = value)) + geom_bar()
    # or, to force the range to be between 0 and 1
    # + geom_bar(position = "fill")

I would approach creating the summary using the plyr package. First, you should convert value to a factor, so that when some user never picked some value, that value will have 0%.

dat$value <- factor(dat$value)

Now, you write your summary function that takes a data frame (technically this step can be smushed into the next step, but this way it's more legible).

p.by.user <- function(df){
  data.frame(prop.table(table(df$value)))
}

Then, apply this function to every subset of dat defined by user.

dat.summary <- ddply(dat, .(user), p.by.user)

A base graphics boxplot of this data would be done like this.

with(dat.summary, boxplot(Freq ~ Var1, ylim = c(0,1)))

If you don't mind my two cents, I don't know that boxplots are the right way to go with this kind of data. This isn't very dense data (if your sample is realistic), and boxplots don't capture the dependency between decisions. That is, if some user chose 1 super frequently, then they must have chosen the other much less frequently.

You could try a filled bar chart for each user, and it wouldn't require any pre-summarization if you use ggplot2. The code would look like this

ggplot(dat, aes(factor(user), fill = value)) + geom_bar()
    # or, to force the range to be between 0 and 1
    # + geom_bar(position = "fill")

回复收藏 0 原文

半透明的墙 2024-09-25 03:49:01

您正在寻找这样的东西吗？

user <- rep(1:10,each=5)
value <- sample(0:2,50,replace=T)
dat <- data.frame(user,value)

percent <- unlist(
    by(dat, dat$user,
        function(user) {
            table(user$value)/length(user$value)*100
        }
    )
)

# make a vector with all percentages
percent <- unlist(percent)
# extract the necessary info from the names
value <- gsub("\\d+\\.(\\d)","\\1",names(percent))

boxplot(percent~value)

Is something like this what you're looking for?

user <- rep(1:10,each=5)
value <- sample(0:2,50,replace=T)
dat <- data.frame(user,value)

percent <- unlist(
    by(dat, dat$user,
        function(user) {
            table(user$value)/length(user$value)*100
        }
    )
)

# make a vector with all percentages
percent <- unlist(percent)
# extract the necessary info from the names
value <- gsub("\\d+\\.(\\d)","\\1",names(percent))

boxplot(percent~value)

回复收藏 0 原文

~没有更多了~