如何在r中绘制分组的,堆叠的条形图以显示数据比例是否均均等
假设您有3套,从1到3。
每个集合包含与变量A,B,C,D的分类活动/非活动标签相关的唯一ID。
您想制作图表,以显示每个变量的比例 在3组中侧面侧面的,以显示它们是否均匀。
我想出的唯一方法是:
# Simulate data: 3 different sets, each with 4 different variables, each with different proportions of labels
sets = c("1", "2", "3")
variables = c("A", "B", "C", "D")
labs = c("active", "inactive")
N = 10000
set.seed(1325)
d = data.frame("set" = sample(sets, N, replace = TRUE, prob = c(0.1, 0.2, 0.7)),
"variable" = sample(variables, N, replace = TRUE, prob = c(0.15, 0.25, 0.2, 0.4)))
d["label"] = "x"
for (v in variables) {
vw = which(d[["variable"]] == v)
vp = runif(1, 0.1, 0.6)
d[vw, "label"] = sample(labs, length(vw), replace = TRUE, prob = c(vp, 1 - vp))
}
d["ID"] <- 1:N
s = aggregate(ID ~ set + variable + label, d, length)
s.l = aggregate(ID ~ set + variable, d, length)
colnames(s.l)[3] <- "ID.l"
s = merge(s, s.l)
s["frac"] = with(s, ID / ID.l)
op = par()
par(mfrow = c(2,2))
for (v in variables) {
barplot(frac ~ label + set, s, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)
给定标签在代码中分配的方式,它们的比例在不同的变量中是不同的,但是整个集合均具有同质性。
要显示当比例不均匀时会发生什么:
# change the proportion of labels for one set
sw = which(d[["set"]] == 1)
d.u = d
d.u[sw, "label"] = sample(labs, length(sw), replace = TRUE, prob = c(0.05, 1 - 0.05))
s.u = aggregate(ID ~ set + variable + label, d.u, length)
s.u.l = aggregate(ID ~ set + variable, d.u, length)
colnames(s.u.l)[3] <- "ID.l"
s.u = merge(s.u, s.u.l)
s.u["frac"] = with(s.u, ID / ID.l)
op = par()
par(mfrow = c(2,2))
for (v in variables) {
barplot(frac ~ label + set, s.u, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)
问题:您认为这可以做得更好或更有效地代表吗?
特别是,我本来以为汇总
和除法部分可能已经内置了某种类型的图。
我想知道是否使用mfrow
并绘制一系列单独的图是什么好处,或者是否有某种方法可以通过使用variable 作为参数之一。
有什么想法吗?
Suppose you have 3 sets, numbered from 1 to 3.
Each set contains unique ID's associated to categorical active/inactive labels for variables A, B, C, D.
You want to make plots that show, for each variable, the proportions of active/inactive labels side to side in the 3 sets, to show if they are homogeneous or not.
The only way I could come up with to do this was the following:
# Simulate data: 3 different sets, each with 4 different variables, each with different proportions of labels
sets = c("1", "2", "3")
variables = c("A", "B", "C", "D")
labs = c("active", "inactive")
N = 10000
set.seed(1325)
d = data.frame("set" = sample(sets, N, replace = TRUE, prob = c(0.1, 0.2, 0.7)),
"variable" = sample(variables, N, replace = TRUE, prob = c(0.15, 0.25, 0.2, 0.4)))
d["label"] = "x"
for (v in variables) {
vw = which(d[["variable"]] == v)
vp = runif(1, 0.1, 0.6)
d[vw, "label"] = sample(labs, length(vw), replace = TRUE, prob = c(vp, 1 - vp))
}
d["ID"] <- 1:N
s = aggregate(ID ~ set + variable + label, d, length)
s.l = aggregate(ID ~ set + variable, d, length)
colnames(s.l)[3] <- "ID.l"
s = merge(s, s.l)
s["frac"] = with(s, ID / ID.l)
op = par()
par(mfrow = c(2,2))
for (v in variables) {
barplot(frac ~ label + set, s, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)
Given how the labels are assigned in the code, their proportions are different for the different variables, but homogeneous across the sets.
To show what happens when the proportions are not homogeneous:
# change the proportion of labels for one set
sw = which(d[["set"]] == 1)
d.u = d
d.u[sw, "label"] = sample(labs, length(sw), replace = TRUE, prob = c(0.05, 1 - 0.05))
s.u = aggregate(ID ~ set + variable + label, d.u, length)
s.u.l = aggregate(ID ~ set + variable, d.u, length)
colnames(s.u.l)[3] <- "ID.l"
s.u = merge(s.u, s.u.l)
s.u["frac"] = with(s.u, ID / ID.l)
op = par()
par(mfrow = c(2,2))
for (v in variables) {
barplot(frac ~ label + set, s.u, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)
Question: do you think this can be done or represented better / more efficiently?
In particular, I would have thought that the aggregate
and division part might be already built-in for some type of plot.
And I am wondering if using mfrow
and plotting an array of separate plots is any good, or if there is some way to make a more cohesive lattice or grid of plots by using variable
as one of the parameters.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
而不是double
汇总
调用,请考虑 以 actiable 拆分数据框架和运行和运行XTABS
+每个子集上的比例
:boxplot
公式样式barplot
矩阵样式Instead of the double
aggregate
calls, considerby
to split data frame by variable and runxtabs
+proportions
on each subset:boxplot
formula stylebarplot
matrix style