如何缓存输入流以供多次使用
我有一个文件的 InputStream,我使用 apache poi 组件来读取它,如下所示:
POIFSFileSystem fileSystem = new POIFSFileSystem(inputStream);
问题是我需要多次使用同一个流,并且 POIFSFileSystem 在使用后关闭该流。
缓存输入流中的数据然后将更多输入流提供给不同的 POIFSFileSystem 的最佳方法是什么?
编辑1:
通过缓存,我的意思是存储供以后使用,而不是作为加速应用程序的一种方式。 另外,将输入流读入数组或字符串,然后为每次使用创建输入流是否更好?
编辑 2:
很抱歉重新提出这个问题,但是在桌面和 Web 应用程序中工作时,情况有些不同。 首先,我从 Tomcat Web 应用程序中的 org.apache.commons.fileupload.FileItem 获得的 InputStream 不支持标记,因此无法重置。
其次,我希望能够将文件保留在内存中,以便在处理文件时加快访问速度并减少 io 问题。
I have an InputStream of a file and i use apache poi components to read from it like this:
POIFSFileSystem fileSystem = new POIFSFileSystem(inputStream);
The problem is that i need to use the same stream multiple times and the POIFSFileSystem closes the stream after use.
What is the best way to cache the data from the input stream and then serve more input streams to different POIFSFileSystem ?
EDIT 1:
By cache i meant store for later use, not as a way to speedup the application. Also is it better to just read up the input stream into an array or string and then create input streams for each use ?
EDIT 2:
Sorry to reopen the question, but the conditions are somewhat different when working inside desktop and web application.
First of all, the InputStream i get from the org.apache.commons.fileupload.FileItem in my tomcat web app doesn't support markings thus cannot reset.
Second, I'd like to be able to keep the file in memory for faster acces and less io problems when dealing with files.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
这可以正常工作:
其中 getBytes 是这样的:
This works correctly:
where getBytes is like this:
您可以使用一个版本来装饰传递给POIFSFileSystem的InputStream,当调用close()时,它会用reset()进行响应:
testcase
EDIT 2
您可以以byte []读取整个文件(slurp模式)然后将其传递给 ByteArrayInputStream
you can decorate InputStream being passed to POIFSFileSystem with a version that when close() is called it respond with reset():
testcase
EDIT 2
you can read the entire file in a byte[] (slurp mode) then passing it to a ByteArrayInputStream
尝试 BufferedInputStream,它向另一个输入流添加标记和重置功能,只需重写其 close 方法即可:
因此:
并在之前使用 inputStream 的地方使用
bis
。Try BufferedInputStream, which adds mark and reset functionality to another input stream, and just override its close method:
So:
and use
bis
wherever inputStream was used before.“缓存”到底是什么意思? 您是否希望不同的 POIFSFileSystem 在流的开头启动? 如果是这样,那么在 Java 代码中缓存任何内容绝对没有意义; 这将由操作系统完成,只需打开一个新流即可。
或者您想从第一个 POIFSFileSystem 停止的地方继续阅读? 这不是缓存,而且很难做到。 如果无法避免流被关闭,我能想到的唯一方法是编写一个薄包装器来计算已读取的字节数,然后打开一个新流并跳过那么多字节。 但当 POIFSFileSystem 内部使用 BufferedInputStream 之类的东西时,这可能会失败。
What exactly do you mean with "cache"? Do you want the different POIFSFileSystem to start at the beginning of the stream? If so, there's absolutely no point caching anything in your Java code; it will be done by the OS, just open a new stream.
Or do you wan to continue reading at the point where the first POIFSFileSystem stopped? That's not caching, and it's very difficult to do. The only way I can think of if you can't avoid the stream getting closed would be to write a thin wrapper that counts how many bytes have been read and then open a new stream and skip that many bytes. But that could fail when POIFSFileSystem internally uses something like a BufferedInputStream.
使用下面的实现来进行更多自定义使用 -
Use below implementation for more custom use -
如果文件不是那么大,请将其读入
byte[]
数组,并为 POI 提供从该数组创建的ByteArrayInputStream
。如果文件很大,那么您不必关心,因为操作系统会尽力为您进行缓存。
[编辑] 使用 Apache commons-io 以有效的方式将文件读入字节数组。 不要使用
int read()
,因为它逐字节读取文件,速度非常慢!如果您想自己执行此操作,请使用
File
对象获取长度,创建数组和从文件读取字节的循环。 您必须循环,因为read(byte[], int offset, int len)
可以读取少于len
字节(通常如此)。If the file is not that big, read it into a
byte[]
array and give POI aByteArrayInputStream
created from that array.If the file is big, then you shouldn't care, since the OS will do the caching for you as best as it can.
[EDIT] Use Apache commons-io to read the File into a byte array in an efficient way. Do not use
int read()
since it reads the file byte by byte which is very slow!If you want to do it yourself, use a
File
object to get the length, create the array and the a loop which reads bytes from the file. You must loop sinceread(byte[], int offset, int len)
can read less thanlen
bytes (and usually does).这就是我的实现方式,可以安全地与任何 InputStream 一起使用:
This is how I would implemented, to be safely used with any InputStream :
此答案迭代之前的答案 1|2 基于
BufferInputStream
。 主要的变化是它允许无限重用。 并负责关闭原始源输入流以释放系统资源。 您的操作系统定义了这些限制,并且您不希望程序耗尽文件句柄(这也是为什么您应该始终使用 apacheEntityUtils.consumeQuietly()
)。 编辑更新了代码以处理使用read(buffer, offset, length)
的贪婪消费者,在这种情况下,BufferedInputStream
可能会尝试很难查看源代码,此代码可以防止这种使用。要重用它,只需先将其关闭(如果不是)。
但一个限制是,如果在读取原始流的全部内容之前关闭流,则此装饰器将具有不完整的数据,因此请确保在关闭之前读取整个流。
This answer iterates on previous ones 1|2 based on the
BufferInputStream
. The main changes are that it allows infinite reuse. And takes care of closing the original source input stream to free-up system resources. Your OS defines a limit on those and you don't want the program to run out of file handles (That's also why you should always 'consume' responses e.g. with the apacheEntityUtils.consumeQuietly()
). EDIT Updated the code to handle for gready consumers that useread(buffer, offset, length)
, in that case it may happen thatBufferedInputStream
tries hard to look at the source, this code protects against that use.To reuse it just close it first if it wasn't.
One limitation though is that if the stream is closed before the whole content of the original stream has been read, then this decorator will have incomplete data, so make sure the whole stream is read before closing.
这有效。 IOUtils 是公共 IO 的一部分。
This works. IOUtils is part of commons IO.
我只是在这里添加我的解决方案,因为这对我有用。 它基本上是前两个答案的组合:)
I just add my solution here, as this works for me. It basically is a combination of the top two answers :)