如何从 R 中的聚合数据创建直方图?
我有一个格式如下的数据框:
Month Frequency
2007-08 2
2010-11 5
2011-01 43
2011-02 52
2011-03 31
2011-04 64
2011-05 73
我想根据该数据创建一个直方图,使用 X 个 bin(X 可能约为 15,但实际数据有超过 200 个月),并使用频率列中的数据作为直方图每个箱的频率。我怎样才能做到这一点?
到目前为止,我已经尝试了两种方法,使用 hist() 和 barplot() 命令。 hist() 的问题在于,它似乎没有给我任何方法来指定我想要在直方图的频率计算中使用频率列。 barplot() 的问题是,我在选择 X 个 bin 时没有任何灵活性,如果省略了月份,则结果图实际上不是真正的直方图,因为 x 轴不连续。
我现在唯一的想法是采用 barplot() 方法,用 0 的频率值填充缺失的月份,并使用 space=0 删除条形之间的间距。问题是选择任意数量的垃圾箱并不是特别容易。
I have a data frame that has a format like the following:
Month Frequency
2007-08 2
2010-11 5
2011-01 43
2011-02 52
2011-03 31
2011-04 64
2011-05 73
I would like to create a histogram from this data, using X bins (X will probably be around 15, but the actual data has over 200 months), and using the data from the frequency column as the frequency for each bin of the histogram. How can I accomplish this?
I've tried two approaches so far, with the hist() and barplot() commands. The problem with hist() is that it does not seem to give me any way to specify that I want to use the frequency column in the frequency calculations for the histogram. The problem with barplot() is that I don't have any flexibility in choosing X bins, and if there are omitted months, then resulting graph is not actually a true histogram because the x-axis isn't continuous.
The only idea I have right now is to go with the barplot() approach, fill in the missing months with a value of 0 for Frequency, and use space=0 to remove the spacing between the bars. The problem with that is that it's not particularly easy to choose an arbitrary number of bins.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
为了获得这种灵活性,您可能必须复制数据。这是使用
rep
执行此操作的一种方法:现在您已将数据复制到 data.frame
expdat
中,从而允许您使用hist
调用不同数量的垃圾箱:To get this kind of flexibility, you may have to replicate your data. Here is one way of doing it with
rep
:Now you have your data replicated in the data.frame
expdat
, allowing you to callhist
with different numbers of bins:看一下 ggplot2。
如果您的数据位于名为
df
的data.frame
中:或者如果您想要连续时间:
take a gander at ggplot2.
if you data is in a
data.frame
calleddf
:or if you want continuous time:
是的,在大多数有趣/大型情况下,
rep
解决方案会浪费太多内存。 HistogramTools CRAN 包包含一个高效的PreBinnedHistogram
函数它直接从 bin 列表创建一个基本 R 直方图对象,并按照提供的原始问题进行中断。Yea,
rep
solutions will waste too much memory in most interesting/large cases. The HistogramTools CRAN package includes an efficientPreBinnedHistogram
function which creates a base R histogram object directly from a list of bins and breaks as the original question provided.另一种可能性是按某个大系数缩小频率变量,以便代表无需做太多工作。然后用相同的因子调整直方图的垂直轴比例。
Another possibility is to scale down your frequency variable by some large factor so that rep doesn't have as much work to do. Then adjust the vertical axis scale of the histogram by that same factor.