在Java中读取OLE2文件而不缓冲到内存中?
我正在使用 Apache POI 读取 OLE2 文件(可能是 Word,可能是 Excel)。使用 POIFSFileSystem,我可以打开文件并读取内容。那一点一切都很好。
但是,它似乎确实使用了相当多的内存。查看 POIFS 的一些位,似乎文件的各个位都被缓冲到内存中,有时不止一次。
是否可以只从文件中读取位,而不需要一次全部加载?我注意到,使用新的文件格式(ooxml),您可以在 File 和 InputStream 之间进行选择,并且文档将 File 构造函数列为较低内存。旧的 OLE2 POIFS 是否有类似的东西?
我正在使用 POI 3.7 Final 以防万一!
I'm using Apache POI to read an OLE2 file (might be Word, might be Excel). Using POIFSFileSystem, I'm able to open the file, and read the contents. That bit's all fine.
However, it does seem to be using quite a bit of memory. Looking at a few bits of POIFS, it seems that various bits of the file get buffered into memory, sometimes more than once.
Is it possible to just read bits in from the File, without loading it all in at once? I notice that with the new file formats (ooxml), you have a choice between a File and an InputStream, and the docs list the File constructor as lower memory. Is there something similar for the older OLE2 POIFS?
I'm using POI 3.7 Final in case that matters!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你很幸运,这是可以做到的,但可惜的是,你需要升级到 beta 版本 - 代码是在 3.7 Final 之后加入的。您应该可以接受 3.8 beta 2,但如果可以的话,您可能需要等待 3.8 beta 3,因为代码仍在开发中。
您需要做的就是从使用 POIFSFileSystem 切换到 NPOIFSFileSystem。 N 前缀代表新的基于 NIO 的 OLE2 代码,使用流时内存效率更高,使用文件时内存效率更高。有关更多详细信息,请参阅 NPOIFSFileSystem 文档。
您的代码将类似于:
在 3.8 beta 2 中,大多数 POIDocument 类(HSSFWorkbook 等)将在其构造函数中接受 DirectoryEntry,因此您可以从 NPOIFSFileSystem 读取它们。然而,写支持还没有完全完成,所以如果您需要写回(具有更高的内存占用),您需要坚持使用 POIFSFileSytem
You're in luck, it can be done, but alas you'll need to upgrade to a beta release - the code went in after 3.7 Final. You should be ok with 3.8 beta 2, but you might want to wait for 3.8 beta 3 if you can as the code's still being worked on.
What you'll need to do is switch from using a POIFSFileSystem to a NPOIFSFileSystem. The N prefix is for the new NIO based OLE2 code, which is more memory efficient when using a stream, and much more memory efficient using a File. See the NPOIFSFileSystem docs for more details.
Your code will want to be something like:
In 3.8 beta 2, most of the POIDocument classes (HSSFWorkbook etc) will accept a DirectoryEntry in their constructor, so you can read them from a NPOIFSFileSystem. However, write support isn't quite finished though, so you'll need to stick with a POIFSFileSytem if you need to write back out (with the higher memory footprint)