在 R 中使用 ggplot2 叠加直方图

发布于 2024-11-27 21:21:41 字数 479 浏览 1 评论 0原文

我是 R 新手，正在尝试在同一张图表上绘制 3 个直方图。一切工作正常，但我的问题是你看不到两个直方图重叠的地方 - 它们看起来相当被切断。

当我制作密度图时，它看起来很完美：每条曲线都被黑色框线包围，并且曲线重叠处的颜色看起来不同。

有人可以告诉我第一张图片中的直方图是否可以实现类似的效果吗？这是我正在使用的代码：

lowf0 <-read.csv (....)
mediumf0 <-read.csv (....)
highf0 <-read.csv(....)
lowf0$utt<-'low f0'
mediumf0$utt<-'medium f0'
highf0$utt<-'high f0'
histogram<-rbind(lowf0,mediumf0,highf0)
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)

原文

I am new to R and am trying to plot 3 histograms onto the same graph.
Everything worked fine, but my problem is that you don't see where 2 histograms overlap - they look rather cut off.

When I make density plots, it looks perfect: each curve is surrounded by a black frame line, and colours look different where curves overlap.

Can someone tell me if something similar can be achieved with the histograms in the 1st picture? This is the code I'm using:

lowf0 <-read.csv (....)
mediumf0 <-read.csv (....)
highf0 <-read.csv(....)
lowf0$utt<-'low f0'
mediumf0$utt<-'medium f0'
highf0$utt<-'high f0'
histogram<-rbind(lowf0,mediumf0,highf0)
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

厌倦 2024-12-04 21:21:41

使用@joran的示例数据，

ggplot(dat, aes(x=xx, fill=yy)) + 
  geom_histogram(alpha=0.2, position="identity")

请注意，geom_histogram()默认为position="stack"。

请参阅 geom_histogram 文档中的“位置调整”

Using @joran's sample data,

ggplot(dat, aes(x=xx, fill=yy)) + 
  geom_histogram(alpha=0.2, position="identity")

Note that geom_histogram() default is position="stack".

see "position adjustment" within geom_histogram documentation

回复收藏 0 原文

为你鎻心 2024-12-04 21:21:41

您当前的代码：

ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)

告诉ggplot使用f0中的所有值构造一个直方图，然后根据变量utt。

相反，您想要创建三个独立的直方图，并使用 alpha 混合，以便它们可以相互可见。因此，您可能想要对 geom_histogram 使用三个单独的调用，其中每个调用都获取自己的数据框并填充：

ggplot(histogram, aes(f0)) + 
    geom_histogram(data = lowf0, fill = "red", alpha = 0.2) + 
    geom_histogram(data = mediumf0, fill = "blue", alpha = 0.2) +
    geom_histogram(data = highf0, fill = "green", alpha = 0.2) +

这是一个带有一些输出的具体示例：

dat <- data.frame(xx = c(runif(100,20,50),runif(100,40,80),runif(100,0,30)),yy = rep(letters[1:3],each = 100))

ggplot(dat,aes(x=xx)) + 
    geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) +
    geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) +
    geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2)

它会生成如下内容：

在此处输入图像描述

已编辑以修复拼写错误；你想要填充，而不是颜色。

Your current code:

ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)

is telling ggplot to construct one histogram using all the values in f0 and then color the bars of this single histogram according to the variable utt.

What you want instead is to create three separate histograms, with alpha blending so that they are visible through each other. So you probably want to use three separate calls to geom_histogram, where each one gets it's own data frame and fill:

ggplot(histogram, aes(f0)) + 
    geom_histogram(data = lowf0, fill = "red", alpha = 0.2) + 
    geom_histogram(data = mediumf0, fill = "blue", alpha = 0.2) +
    geom_histogram(data = highf0, fill = "green", alpha = 0.2) +

Here's a concrete example with some output:

dat <- data.frame(xx = c(runif(100,20,50),runif(100,40,80),runif(100,0,30)),yy = rep(letters[1:3],each = 100))

ggplot(dat,aes(x=xx)) + 
    geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) +
    geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) +
    geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2)

which produces something like this:

enter image description here

Edited to fix typos; you wanted fill, not colour.

回复收藏 0 原文

燕归巢 2024-12-04 21:21:41

虽然在 ggplot2 中绘制多个/重叠直方图只需要几行，但结果并不总是令人满意。需要正确使用边框和颜色，以确保眼睛能够区分直方图。

以下函数平衡边框颜色、不透明度和叠加密度图，使查看者能够区分分布。

单个直方图：

plot_histogram <- function(df, feature) {
    plt <- ggplot(df, aes(x=eval(parse(text=feature)))) +
    geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") +
    geom_density(alpha=0.3, fill="red") +
    geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
    labs(x=feature, y = "Density")
    print(plt)
}

多个直方图：

plot_multi_histogram <- function(df, feature, label_column) {
    plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
    geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
    geom_density(alpha=0.7) +
    geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
    labs(x=feature, y = "Density")
    plt + guides(fill=guide_legend(title=label_column))
}

用法：

只需将数据框与所需参数一起传递到上述函数中：

plot_histogram(iris, 'Sepal.Width')

plot_multi_histogram(iris, 'Sepal.Width', 'Species')

plot_multi_histogram 中的额外参数是包含类别标签的列的名称。

通过创建具有许多不同分布方式的数据框，我们可以更直观地看到这一点：

a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000))
b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000))
c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000))
d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000))
e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000))
f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000))
many_distros <- do.call('rbind', list(a,b,c,d,e,f))

像以前一样传递数据框（并使用选项扩大图表）：

options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, 'n', 'category')

添加 每个单独的垂直线distribution：

plot_multi_histogram <- function(df, feature, label_column, means) {
    plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
    geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
    geom_density(alpha=0.7) +
    geom_vline(xintercept=means, color="black", linetype="dashed", size=1)
    labs(x=feature, y = "Density")
    plt + guides(fill=guide_legend(title=label_column))
}

与之前的 plot_multi_histogram 函数相比，唯一的变化是在参数中添加了 means，并将 geom_vline 行更改为接受多个值。

用法：

options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, "n", 'category', c(1, 2, 3, 4, 5, 6))

结果：

由于我在 many_distros 中显式设置了方法，因此我可以简单地将它们传入。或者您也可以只需在函数内计算这些并使用那样。

While only a few lines are required to plot multiple/overlapping histograms in ggplot2, the results are't always satisfactory. There needs to be proper use of borders and coloring to ensure the eye can differentiate between histograms.

The following functions balance border colors, opacities, and superimposed density plots to enable the viewer to differentiate among distributions.

Single histogram:

plot_histogram <- function(df, feature) {
    plt <- ggplot(df, aes(x=eval(parse(text=feature)))) +
    geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") +
    geom_density(alpha=0.3, fill="red") +
    geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
    labs(x=feature, y = "Density")
    print(plt)
}

Multiple histogram:

plot_multi_histogram <- function(df, feature, label_column) {
    plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
    geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
    geom_density(alpha=0.7) +
    geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
    labs(x=feature, y = "Density")
    plt + guides(fill=guide_legend(title=label_column))
}

Usage:

Simply pass your data frame into the above functions along with desired arguments:

plot_histogram(iris, 'Sepal.Width')

plot_multi_histogram(iris, 'Sepal.Width', 'Species')

The extra parameter in plot_multi_histogram is the name of the column containing the category labels.

We can see this more dramatically by creating a dataframe with many different distribution means:

a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000))
b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000))
c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000))
d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000))
e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000))
f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000))
many_distros <- do.call('rbind', list(a,b,c,d,e,f))

Passing data frame in as before (and widening chart using options):

options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, 'n', 'category')

To add a separate vertical line for each distribution:

plot_multi_histogram <- function(df, feature, label_column, means) {
    plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
    geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
    geom_density(alpha=0.7) +
    geom_vline(xintercept=means, color="black", linetype="dashed", size=1)
    labs(x=feature, y = "Density")
    plt + guides(fill=guide_legend(title=label_column))
}

The only change over the previous plot_multi_histogram function is the addition of means to the parameters, and changing the geom_vline line to accept multiple values.

Usage:

options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, "n", 'category', c(1, 2, 3, 4, 5, 6))

Result:

Since I set the means explicitly in many_distros I can simply pass them in. Alternatively you can simply calculate these inside the function and use that way.

回复收藏 0 原文

~没有更多了~