如何在r中绘制分组的，堆叠的条形图以显示数据比例是否均均等

发布于 2025-01-28 14:51:58 字数 2286 浏览 1 评论 0原文

假设您有3套，从1到3。
每个集合包含与变量A，B，C，D的分类活动/非活动标签相关的唯一ID。
您想制作图表，以显示每个变量的比例在3组中侧面侧面的，以显示它们是否均匀。

我想出的唯一方法是：

# Simulate data: 3 different sets, each with 4 different variables, each with different proportions of labels
sets = c("1", "2", "3")
variables = c("A", "B", "C", "D")
labs = c("active", "inactive")
N = 10000
set.seed(1325)
d = data.frame("set" = sample(sets, N, replace = TRUE, prob = c(0.1, 0.2, 0.7)),
               "variable" = sample(variables, N, replace = TRUE, prob = c(0.15, 0.25, 0.2, 0.4)))
d["label"] = "x"
for (v in variables) {
  vw = which(d[["variable"]] == v)
  vp = runif(1, 0.1, 0.6)
  d[vw, "label"] = sample(labs, length(vw), replace = TRUE, prob = c(vp, 1 - vp))
}
d["ID"] <- 1:N

s = aggregate(ID ~ set + variable + label, d, length)
s.l = aggregate(ID ~ set + variable, d, length)
colnames(s.l)[3] <- "ID.l"
s = merge(s, s.l)
s["frac"] = with(s, ID / ID.l)

op = par()
par(mfrow = c(2,2))
for (v in variables) {
  barplot(frac ~ label + set, s, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)

给定标签在代码中分配的方式，它们的比例在不同的变量中是不同的，但是整个集合均具有同质性。

要显示当比例不均匀时会发生什么：

# change the proportion of labels for one set
sw = which(d[["set"]] == 1)
d.u = d
d.u[sw, "label"] = sample(labs, length(sw), replace = TRUE, prob = c(0.05, 1 - 0.05))

s.u = aggregate(ID ~ set + variable + label, d.u, length)
s.u.l = aggregate(ID ~ set + variable, d.u, length)
colnames(s.u.l)[3] <- "ID.l"
s.u = merge(s.u, s.u.l)
s.u["frac"] = with(s.u, ID / ID.l)

op = par()
par(mfrow = c(2,2))
for (v in variables) {
  barplot(frac ~ label + set, s.u, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)

问题：您认为这可以做得更好或更有效地代表吗？

特别是，我本来以为汇总和除法部分可能已经内置了某种类型的图。
我想知道是否使用mfrow并绘制一系列单独的图是什么好处，或者是否有某种方法可以通过使用variable 作为参数之一。

有什么想法吗？

原文

Suppose you have 3 sets, numbered from 1 to 3.
Each set contains unique ID's associated to categorical active/inactive labels for variables A, B, C, D.
You want to make plots that show, for each variable, the proportions of active/inactive labels side to side in the 3 sets, to show if they are homogeneous or not.

The only way I could come up with to do this was the following:

# Simulate data: 3 different sets, each with 4 different variables, each with different proportions of labels
sets = c("1", "2", "3")
variables = c("A", "B", "C", "D")
labs = c("active", "inactive")
N = 10000
set.seed(1325)
d = data.frame("set" = sample(sets, N, replace = TRUE, prob = c(0.1, 0.2, 0.7)),
               "variable" = sample(variables, N, replace = TRUE, prob = c(0.15, 0.25, 0.2, 0.4)))
d["label"] = "x"
for (v in variables) {
  vw = which(d[["variable"]] == v)
  vp = runif(1, 0.1, 0.6)
  d[vw, "label"] = sample(labs, length(vw), replace = TRUE, prob = c(vp, 1 - vp))
}
d["ID"] <- 1:N

s = aggregate(ID ~ set + variable + label, d, length)
s.l = aggregate(ID ~ set + variable, d, length)
colnames(s.l)[3] <- "ID.l"
s = merge(s, s.l)
s["frac"] = with(s, ID / ID.l)

op = par()
par(mfrow = c(2,2))
for (v in variables) {
  barplot(frac ~ label + set, s, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)

Given how the labels are assigned in the code, their proportions are different for the different variables, but homogeneous across the sets.

To show what happens when the proportions are not homogeneous:

# change the proportion of labels for one set
sw = which(d[["set"]] == 1)
d.u = d
d.u[sw, "label"] = sample(labs, length(sw), replace = TRUE, prob = c(0.05, 1 - 0.05))

s.u = aggregate(ID ~ set + variable + label, d.u, length)
s.u.l = aggregate(ID ~ set + variable, d.u, length)
colnames(s.u.l)[3] <- "ID.l"
s.u = merge(s.u, s.u.l)
s.u["frac"] = with(s.u, ID / ID.l)

op = par()
par(mfrow = c(2,2))
for (v in variables) {
  barplot(frac ~ label + set, s.u, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)

Question: do you think this can be done or represented better / more efficiently?

In particular, I would have thought that the aggregate and division part might be already built-in for some type of plot.
And I am wondering if using mfrow and plotting an array of separate plots is any good, or if there is some way to make a more cohesive lattice or grid of plots by using variable as one of the parameters.

Any ideas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦回梦里 2025-02-04 14:51:58

而不是double 汇总调用，请考虑以 actiable 拆分数据框架和运行和运行XTABS + 每个子集上的比例：

boxplot公式样式

op <- par(mfrow = c(2,2))
tbls <- by(d, d$variable, FUN=function(sub) {
  tbl <- xtabs(~ label + set, sub)
  props <- data.frame(proportions(tbl, 2))
  barplot(Freq ~ label + set, props, 
          col = c("blue", "orange"), 
          main = sub$variable[1])
})
par(op)

barplot矩阵样式

op <- par(mfrow = c(2,2))
tbls <- by(d, d$variable, FUN=function(sub) {
  tbl <- xtabs(~ label + set, sub)
  props <- proportions(tbl, 2)
  barplot(props, xlab = "set", ylab = "frac",
          col = c("blue", "orange"), 
          main = sub$variable[1])
})
par(op)

Instead of the double aggregate calls, consider by to split data frame by variable and run xtabs + proportions on each subset:

boxplot formula style

op <- par(mfrow = c(2,2))
tbls <- by(d, d$variable, FUN=function(sub) {
  tbl <- xtabs(~ label + set, sub)
  props <- data.frame(proportions(tbl, 2))
  barplot(Freq ~ label + set, props, 
          col = c("blue", "orange"), 
          main = sub$variable[1])
})
par(op)

barplot matrix style

op <- par(mfrow = c(2,2))
tbls <- by(d, d$variable, FUN=function(sub) {
  tbl <- xtabs(~ label + set, sub)
  props <- proportions(tbl, 2)
  barplot(props, xlab = "set", ylab = "frac",
          col = c("blue", "orange"), 
          main = sub$variable[1])
})
par(op)

回复收藏 0 原文

~没有更多了~