内联图像与临时文件(Java XHTML->PDF 生成)
我有一个项目需要生成 PDF 文件。在此 PDF 中,我必须插入文本正文以及四到五个大图像(大约 800 像素 * 1000 像素)。为了使其灵活,我选择将 FreeMarker 与 XHTMLRenderer(飞碟)结合使用。
我现在面临几个选择:
- 创建图像并将它们作为临时文件保存到磁盘。然后使用 FreeMarker 处理
.xhtml
模板(将其保存到磁盘),并将处理后的.xhtml
文件 URL 传递给 XHTMLRenderer 以生成 PDF。所有这些创建的文件(PDF 除外)都将使用File.createTempFile
创建。这将允许 FreeMarker 从磁盘中拾取图像(就好像它们是在 XHTML 中链接的图像一样) - 处理
.xhtml
模板并将其保留在内存中。将图像作为 base64 编码数据 url 传递到模板。这将消除保存任何临时文件的需要,因为 FreeMarker 的输出可以直接传递到 XHTMLRenderer。
Base64 编码图像 Url 示例(一个小文件夹图标):
<img src="data:image/gif;base64,R0lGODlhEAAOALMAAOazToeHh0tLS/7LZv/0jvb29t/f3//Ub/
/ge8WSLf/rhf/3kdbW1mxsbP//mf///yH5BAAAAAAALAAAAAAQAA4AAARe8L1Ekyky67QZ1hLnjM5UUde0ECwLJoExK
cppV0aCcGCmTIHEIUEqjgaORCMxIC6e0CcguWw6aFjsVMkkIr7g77ZKPJjPZqIyd7sJAgVGoEGv2xsBxqNgYPj/gAwXEQA7" />
我的主要问题是哪种技术更好?创建大量临时文件是否不好(是否会带来大量开销)?创建如此大的 Base64 编码字符串是否可能会耗尽内存?
I have a project where I need to generate a PDF file. Within this PDF I have to insert a body of text as well as four or five large images (roughly 800px*1000px). In order to make this flexible I have opted to use FreeMarker in conjunction with XHTMLRenderer (flying-saucer).
I am now faced with a couple of options:
- Create the images and save them as temporary files to disk. Then process an
.xhtml
template with FreeMarker (saving it to disk) and pass the processed.xhtml
file URL to XHTMLRenderer to generate the PDF. All these created files (bar the PDF) would be created withFile.createTempFile
. This would allow FreeMarker to pick the images up off the disk (as if they were images linked in the XHTML) - Process the
.xhtml
template and keep it in memory. Pass the images to the template as base64 encoded data urls. This would remove the need for saving any temporary files as the output from FreeMarker could be passed directly to XHTMLRenderer.
Base64 Encoded Image Url example (a small folder icon):
<img src="data:image/gif;base64,R0lGODlhEAAOALMAAOazToeHh0tLS/7LZv/0jvb29t/f3//Ub/
/ge8WSLf/rhf/3kdbW1mxsbP//mf///yH5BAAAAAAALAAAAAAQAA4AAARe8L1Ekyky67QZ1hLnjM5UUde0ECwLJoExK
cppV0aCcGCmTIHEIUEqjgaORCMxIC6e0CcguWw6aFjsVMkkIr7g77ZKPJjPZqIyd7sJAgVGoEGv2xsBxqNgYPj/gAwXEQA7" />
My main question is which would be a better technique? Is creating lots of temporary files bad (does it carry lots of overhead)? Could I potentially run out of memory creating such large base64 encoded strings?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我发现自己最近也问了同样的问题。经过一些基准测试后,事实证明数据 URI 方法是最佳选择。
存储一堆 Base64 编码的图像可能会很昂贵。但是,创建临时文件、流式传输图像数据、然后等待 XHTMLRenderer 在清理临时文件之前命中该临时文件 4 次的开销也很繁重。
在我的实验中,Base64 图像被证明是更好的方法。话虽这么说,我不确定对于更大的图像来说,它在多大程度上仍然如此。就我而言,我使用 32x32 图标、80x80 徽标、400x240 条形图和一个 600x400 图形进行测试。除 600x400 图形外,其他所有内容的开销差异都非常显着,在 600x400 图形中,开销差异实际上可以忽略不计。
(Joop Eggen 的旁注 - 就我而言,PDF 生成对时间至关重要。用户单击 PDF 按钮并期望立即开始下载。)
I found myself asking the same question recently. After some benchmarking, it turns out the data URI approach was the best bet.
Storing a bunch of Base64-encoded images can be expensive. But the overhead for creating temp files, streaming image data in, then waiting for XHTMLRenderer hit that temp file 4 times before cleaning it up is also taxing.
In my experiments, the Base64 images proved to be a better approach. That being said, I'm not sure to what extent it will remain true for larger images. In my case, I was testing with 32x32 icons, 80x80 logos, 400x240 bar graphs and one 600x400 graphic. The difference in overhead was significant with everything except the 600x400 graphic, where it got really negligible.
(A side note for Joop Eggen- In my case, PDF generation is time critical. The user clicks a button the PDF and expects the download to begin immediately.)
PDF 生成对时间要求不高 - 人们甚至可以考虑限制通信。在本来就很昂贵的模板转换中,在 Base64 中嵌入图像会花费更多的 CPU 和内存:Base64 批量数据被拖过模板管道,然后可能从 Base64 解码为二进制进行压缩。我什至不知道嵌入图像是可能的。所以临时文件的开销是一个更确定的解决方案。当然要先开始。当然,可以对这两种情况进行基准测试。
PDF generation is not time critical - one might even considering throtling the communication. Embedding images in Base64 costs a bit more CPU and memory in an already costly templating transformation: the Base64 buld data is dragged through the templating pipeling, then probably decoded from Base64 to binary to be compressed. I even was unaware that embedded images are possible. So the overhead of temp files is a more sure solution. Certainly to start with. Of course one can benchmark both cases.