ggplot2 - 打印绘图气球内存
将大型 ggplot 打印到 PDF 是否会导致 RSession 内存膨胀?我有一个大约 72 MB 的 ggplot2 对象。打印为 PDF 时,我的 RSession 增长到超过 2 GB。这是预期的吗?有没有办法优化性能?我发现生成的 PDF 很大~25meg,我必须使用外部程序将它们缩小(50kb,没有视觉损失!)。有没有办法以较低质量的图形打印为 PDF?或者也许是一些我没有考虑过的 print 或 ggplot 参数?
Is it expected that printing a large-ish ggplot to PDF will cause the RSession memory to balloon? I have a ggplot2 object that is around 72 megabytes. My RSession grows to over 2 gig when printing to PDF. Is this expected? Are there ways to optimize performance? I find that the resulting PDFs are huge ~25meg and I have to use an external program to shrink them down (50kb with no visual loss!). Is there a way to print to PDF with lower quality graphics? Or perhaps some parameter to print or ggplot that I haven't considered?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
对于大型数据集,我发现在将 ggplot 放在一起之前预处理数据很有帮助(即使 ggplot 提供相同的计算)。
ggplot
必须非常通用:它无法预测您稍后要添加的统计数据或几何图形,因此很难优化那里的内容( split-apply-组合策略可能会导致中间内存需求激增)。 OTOH,你知道你想要什么,并且可以相应地预先计算。大的 pdf 表明您要么有很多过度绘制,要么您生成的对象太小而无法看到。在这两种情况下,通过应用适当的汇总统计(例如 hexbin 或箱线图而不是散点图),您可以获得很多收获。
我想如果没有您正在做的事情的细节,我们就无法告诉您更多信息。因此,请创建一个最小的示例和/或上传您正在生成的压缩图。
For large data sets, I find it helpful to pre-process the data before putting together the
ggplot
(even ifggplot
offers the same calculations).ggplot
has to be very general: it cannot predict what stat or geom you want to add later on, so it is very difficult to optimize things there (the split-apply-combine strategy can lead to exploding intermediat memory requirements). OTOH, you know what you want and can pre-calculate accordingly.The large pdf indicates that you either have a lot of overplotting or you produce objects that are too small to be seen. In both cases, you could gain a lot by applying appropriate summary statistics (e.g. hexbin or boxplot instead of scatterplot).
I think we cannot tell you more without details of what you are doing. So please create a minimal example and/or upload the compressed plot you are producing.
针对问题的第二部分,R 没有尝试优化 PDF。如果你过度绘制了很多点,这会导致一些荒谬的行为。您可以使用 qpdf 对 PDF 进行后处理。
有趣的是,解决第一个问题,中型数据集上的绘图似乎确实占用了大量内存,但这只是我的经验。其他人可能有更多关于原因的意见或更多关于是否如此的事实。
Addressing the second part of your question, R makes no attempt to optimize PDFs. If you are overplotting a lot of points, this results in some ridiculous behavior. You can use qpdf to post-process the PDF.
Addressing the first question anecdotally, it does seem that plots on medium-sized datasets take up a lot of memory, but that is merely my experience. Others may have more opinions as to why or more facts as to whether this is so.
以 png 等位图格式保存可以显着减小文件大小。请注意,这仅适用于最终图像的某些用途,特别是它无法像 pdf 那样放大。但如果最终图像尺寸已知,这可能是一种有用的方法。
Saving in a bitmap format like png can reduce the filesize considerably. Note that this is only appropriate for certain uses of the final image, in particular, it can't be zoomed in as far as a pdf can. But if the final image size is known it can be a useful method.