如何在 R 中一起绘制两个直方图?
我正在使用 R,并且有两个数据框:胡萝卜和黄瓜。每个数据框都有一个数字列,列出所有测量的胡萝卜(总计:100k 胡萝卜)和黄瓜(总计:50k 黄瓜)的长度。
我希望在同一个图上绘制两个直方图(胡萝卜长度和黄瓜长度)。它们重叠,所以我想我还需要一些透明度。我还需要使用相对频率而不是绝对数字,因为每组中的实例数量不同。
像这样的东西会很好,但是我如何从我的两个表中创建它?
I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms—carrot length and cucumbers lengths—on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice, but how can I create it from my two tables?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
这是一个使用基础图形和 alpha 混合的更简单的解决方案(不适用于所有图形设备):
关键是颜色是半透明的。
编辑,两年多后:由于这刚刚获得了赞成票,我想我也可以添加代码生成的视觉效果,因为 alpha 混合非常有用:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
您链接到的图像是密度曲线,而不是直方图。
如果您一直在阅读 ggplot,那么您可能唯一缺少的就是将两个数据框组合成一个长数据框。
因此,让我们从您拥有的两组独立数据开始,然后将它们组合起来。
之后,如果您的数据已经是长格式,则无需这样做,您只需要一行即可绘制图形。
现在,如果您确实想要直方图,则以下内容将起作用。请注意,您必须更改默认“stack”参数的位置。如果您并不真正了解数据应该是什么样子,您可能会错过这一点。阿尔法值越高看起来越好。另请注意,我将其制作为密度直方图。删除
y = ..密度..
即可轻松恢复计数。另外,我评论了 Dirk 的问题,即所有参数都可以简单地在
中hist命令。有人问我如何做到这一点。接下来产生的正是德克的图形。
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the
y = ..density..
to get it back to counts.On additional thing, I commented on Dirk's question that all of the arguments could simply be in the
hist
command. I was asked how that could be done. What follows produces exactly Dirk's figure.这是我编写的一个函数 使用伪透明度来表示重叠直方图
这是使用 R 支持的另一种方法透明颜色
结果最终看起来像这样:
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
Here's another way to do it using R's support for transparent colors
The results end up looking something like this:
已经有漂亮的答案了,但我想添加这个。我觉得不错。
(从@Dirk复制随机数)。需要
library(scales)
` 结果是...
更新:这个重叠功能可能对某些人也有用。
我觉得
hist0
的结果比hist
看起来更漂亮,结果
是
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from @Dirk).
library(scales)
is needed`The result is...
Update: This overlapping function may also be useful to some.
I feel result from
hist0
is prettier to look thanhist
The result of
is
这是类似于 ggplot2 的版本,我仅在基础 R 中提供。我从 @nullglob 复制了一些。
生成数据
您不需要像 ggplot2 那样将其放入数据框中。这种方法的缺点是你必须写出更多的情节细节。优点是你可以控制剧情的更多细节。
Here's the version like the ggplot2 one I gave only in base R. I copied some from @nullglob.
generate the data
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
以下是如何在“经典”R 图形中执行此操作的示例:
唯一的问题是,如果直方图中断对齐,则看起来会更好,这可能需要手动完成(在传递给
的参数中) >历史)。
Here is an example of how you can do it in "classic" R graphics:
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to
hist
).@Dirk Eddelbuettel:基本思想非常好,但是所示的代码可以改进。 [需要很长时间来解释,因此需要单独的答案而不是评论。]
hist()
函数默认绘制绘图,因此您需要添加plot=FALSE
选项。此外,通过plot(0,0,type="n",...)
调用来建立绘图区域更清晰,您可以在其中添加轴标签、绘图标题等。最后,我想提一下,还可以使用阴影来区分两个直方图。这是代码:这是结果(由于 RStudio 有点太宽了:-)):
@Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The
hist()
function by default draws plots, so you need to add theplot=FALSE
option. Moreover, it is clearer to establish the plot area by aplot(0,0,type="n",...)
call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:And here is the result (a bit too wide because of RStudio :-) ):
Plotly 的 R API 可能对您有用。下图位于此处。
完全披露:我在团队中。
Plotly's R API might be useful for you. The graph below is here.
Full disclosure: I'm on the team.
有很多很棒的答案,但因为我刚刚在 ' 中编写了一个函数 (
plotMultipleHistograms()
) basicPlotteR' package) 函数来执行此操作,我想我会添加另一个答案。此函数的优点是它会自动设置适当的 X 和 Y 轴限制,并定义在所有分布中使用的一组通用 bin。
使用方法如下:
plotMultipleHistograms()
函数可以采用任意数量的分布,并且所有常规绘图参数都应适用于它(例如:las
、main
等)。So many great answers but since I've just written a function (
plotMultipleHistograms()
in 'basicPlotteR' package) function to do this, I thought I would add another answer.The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
The
plotMultipleHistograms()
function can take any number of distributions, and all the general plotting parameters should work with it (for example:las
,main
, etc.).