如何在r中绘制分组的,堆叠的条形图以显示数据比例是否均均等

发布于 2025-01-28 14:51:58 字数 2286 浏览 1 评论 0原文

假设您有3套,从1到3。
每个集合包含与变量A,B,C,D的分类活动/非活动标签相关的唯一ID。
您想制作图表,以显示每个变量的比例 在3组中侧面侧面的,以显示它们是否均匀。

我想出的唯一方法是:

# Simulate data: 3 different sets, each with 4 different variables, each with different proportions of labels
sets = c("1", "2", "3")
variables = c("A", "B", "C", "D")
labs = c("active", "inactive")
N = 10000
set.seed(1325)
d = data.frame("set" = sample(sets, N, replace = TRUE, prob = c(0.1, 0.2, 0.7)),
               "variable" = sample(variables, N, replace = TRUE, prob = c(0.15, 0.25, 0.2, 0.4)))
d["label"] = "x"
for (v in variables) {
  vw = which(d[["variable"]] == v)
  vp = runif(1, 0.1, 0.6)
  d[vw, "label"] = sample(labs, length(vw), replace = TRUE, prob = c(vp, 1 - vp))
}
d["ID"] <- 1:N

s = aggregate(ID ~ set + variable + label, d, length)
s.l = aggregate(ID ~ set + variable, d, length)
colnames(s.l)[3] <- "ID.l"
s = merge(s, s.l)
s["frac"] = with(s, ID / ID.l)

op = par()
par(mfrow = c(2,2))
for (v in variables) {
  barplot(frac ~ label + set, s, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)

”在此处输入图像说明“

给定标签在代码中分配的方式,它们的比例在不同的变量中是不同的,但是整个集合均具有同质性。

要显示当比例不均匀时会发生什么:

# change the proportion of labels for one set
sw = which(d[["set"]] == 1)
d.u = d
d.u[sw, "label"] = sample(labs, length(sw), replace = TRUE, prob = c(0.05, 1 - 0.05))

s.u = aggregate(ID ~ set + variable + label, d.u, length)
s.u.l = aggregate(ID ~ set + variable, d.u, length)
colnames(s.u.l)[3] <- "ID.l"
s.u = merge(s.u, s.u.l)
s.u["frac"] = with(s.u, ID / ID.l)

op = par()
par(mfrow = c(2,2))
for (v in variables) {
  barplot(frac ~ label + set, s.u, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)

”“在此处输入图像描述”

问题:您认为这可以做得更好或更有效地代表吗?

特别是,我本来以为汇总和除法部分可能已经内置了某种类型的图。
我想知道是否使用mfrow并绘制一系列单独的图是什么好处,或者是否有某种方法可以通过使用variable 作为参数之一。

有什么想法吗?

Suppose you have 3 sets, numbered from 1 to 3.
Each set contains unique ID's associated to categorical active/inactive labels for variables A, B, C, D.
You want to make plots that show, for each variable, the proportions of active/inactive labels side to side in the 3 sets, to show if they are homogeneous or not.

The only way I could come up with to do this was the following:

# Simulate data: 3 different sets, each with 4 different variables, each with different proportions of labels
sets = c("1", "2", "3")
variables = c("A", "B", "C", "D")
labs = c("active", "inactive")
N = 10000
set.seed(1325)
d = data.frame("set" = sample(sets, N, replace = TRUE, prob = c(0.1, 0.2, 0.7)),
               "variable" = sample(variables, N, replace = TRUE, prob = c(0.15, 0.25, 0.2, 0.4)))
d["label"] = "x"
for (v in variables) {
  vw = which(d[["variable"]] == v)
  vp = runif(1, 0.1, 0.6)
  d[vw, "label"] = sample(labs, length(vw), replace = TRUE, prob = c(vp, 1 - vp))
}
d["ID"] <- 1:N

s = aggregate(ID ~ set + variable + label, d, length)
s.l = aggregate(ID ~ set + variable, d, length)
colnames(s.l)[3] <- "ID.l"
s = merge(s, s.l)
s["frac"] = with(s, ID / ID.l)

op = par()
par(mfrow = c(2,2))
for (v in variables) {
  barplot(frac ~ label + set, s, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)

enter image description here

Given how the labels are assigned in the code, their proportions are different for the different variables, but homogeneous across the sets.

To show what happens when the proportions are not homogeneous:

# change the proportion of labels for one set
sw = which(d[["set"]] == 1)
d.u = d
d.u[sw, "label"] = sample(labs, length(sw), replace = TRUE, prob = c(0.05, 1 - 0.05))

s.u = aggregate(ID ~ set + variable + label, d.u, length)
s.u.l = aggregate(ID ~ set + variable, d.u, length)
colnames(s.u.l)[3] <- "ID.l"
s.u = merge(s.u, s.u.l)
s.u["frac"] = with(s.u, ID / ID.l)

op = par()
par(mfrow = c(2,2))
for (v in variables) {
  barplot(frac ~ label + set, s.u, subset = variable == v, col = c("blue", "orange"), main = v)
}
par(op)

enter image description here

Question: do you think this can be done or represented better / more efficiently?

In particular, I would have thought that the aggregate and division part might be already built-in for some type of plot.
And I am wondering if using mfrow and plotting an array of separate plots is any good, or if there is some way to make a more cohesive lattice or grid of plots by using variable as one of the parameters.

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦回梦里 2025-02-04 14:51:58

而不是double 汇总调用,请考虑 以 actiable 拆分数据框架和运行和运行XTABS + 每个子集上的比例

boxplot公式样式

op <- par(mfrow = c(2,2))
tbls <- by(d, d$variable, FUN=function(sub) {
  tbl <- xtabs(~ label + set, sub)
  props <- data.frame(proportions(tbl, 2))
  barplot(Freq ~ label + set, props, 
          col = c("blue", "orange"), 
          main = sub$variable[1])
})
par(op)

barplot矩阵样式

op <- par(mfrow = c(2,2))
tbls <- by(d, d$variable, FUN=function(sub) {
  tbl <- xtabs(~ label + set, sub)
  props <- proportions(tbl, 2)
  barplot(props, xlab = "set", ylab = "frac",
          col = c("blue", "orange"), 
          main = sub$variable[1])
})
par(op)

Instead of the double aggregate calls, consider by to split data frame by variable and run xtabs + proportions on each subset:

boxplot formula style

op <- par(mfrow = c(2,2))
tbls <- by(d, d$variable, FUN=function(sub) {
  tbl <- xtabs(~ label + set, sub)
  props <- data.frame(proportions(tbl, 2))
  barplot(Freq ~ label + set, props, 
          col = c("blue", "orange"), 
          main = sub$variable[1])
})
par(op)

barplot matrix style

op <- par(mfrow = c(2,2))
tbls <- by(d, d$variable, FUN=function(sub) {
  tbl <- xtabs(~ label + set, sub)
  props <- proportions(tbl, 2)
  barplot(props, xlab = "set", ylab = "frac",
          col = c("blue", "orange"), 
          main = sub$variable[1])
})
par(op)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文