FileInputStream.read(byte[]) 有什么问题?
回应我对文件的回答-阅读问题,评论者表示FileInputStream.read(byte[])
“不能保证填充缓冲区”。
File file = /* ... */
long len = file.length();
byte[] buffer = new byte[(int)len];
FileInputStream in = new FileInputStream(file);
in.read(buffer);
(代码假定文件长度不超过 2GB)
除了 IOException
之外,还有什么可能导致 read
方法无法检索整个文件内容?
编辑:
代码的想法(以及我回答的问题的OP的目标)是将整个文件一次性读入一大块内存,这就是为什么buffer_size = 文件大小。
In response to my answer to a file-reading question, a commenter stated that FileInputStream.read(byte[])
is "not guaranteed to fill the buffer."
File file = /* ... */
long len = file.length();
byte[] buffer = new byte[(int)len];
FileInputStream in = new FileInputStream(file);
in.read(buffer);
(The code assumes that the file length does not exceed 2GB)
Apart from an IOException
, what could cause the read
method to not retrieve the entire file contents?
EDIT:
The idea of the code (and the goal of the OP of the question I answered) is to read the entire file into a chunk of memory in one swoop, that's why buffer_size = file_size.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
人们已经讨论过在
FileInputStream
上读取假设没有填充缓冲区。事实上,在某些情况下这是现实:如果您在“/dev/tty”或命名管道上打开 FileInputStream,则
读取
将仅返回当前可用的数据。其他设备文件可能具有相同的行为方式。 (这些文件可能会返回0L
作为文件大小。)A 如果使用
direct_io
选项挂载文件系统,或者使用相应标志打开文件,则可以实现FUSE文件系统不完全填充读取缓冲区。上述内容适用于 Linux,但其他操作系统和/或 Java 实现也可能存在类似情况。最重要的是,javadocs允许这种行为,如果您的应用程序认为它不会发生,您可能会遇到麻烦。
有一些第三方库实现了“完全阅读”行为;例如 Apache commons 提供
FileUtils.readFileToByteArray
或IOUtils。 toByteArray
和类似的方法。如果您想要/需要这种行为,您应该使用这些库之一,或者自己实现它。People have talked about read on a
FileInputStream
as hypothetically not filling the buffer. In fact it is a reality in some circumstances:If you open a FileInputStream on a "/dev/tty" or a named pipe, then a
read
will only return you the data that is currently available. Other device files may behave the same way. (These files will probably return0L
as the file size though.)A FUSE file system can be implemented to not completely fill the read buffer if the file system has been mounted with the
direct_io
option, or a file is opened with the corresponding flag.The above apply to Linux, but there could well be similar cases for other operating systems and/or Java implementations. The bottom line is that the javadocs allow this behavior and you can get into trouble if your application assumes that it won't occur.
There are 3rd party libraries that implement "read fully" behavior; e.g. Apache commons provides
FileUtils.readFileToByteArray
orIOUtils.toByteArray
and similar methods. If you want / need that behavior you should use one of those libraries, or implement it yourself.不保证填充缓冲区。
文件大小可能小于缓冲区,或者文件的其余部分可能小于缓冲区。
It's not guaranteed to Fill the buffer.
The file size may be smaller than the buffer, or the remainder of the file may be smaller than the buffer.
你的问题是自相矛盾的。无法保证它会读取整个缓冲区,即使在任何可以想象的情况下它都不会读取。没有任何保证,所以你不能假设它。
Your question is self-contradictory. There is no guarantee that it will read the whole buffer, even if there are no imaginable circumstances in which it won't. There is no guarantee so you can't assume it.
在我自己的 API 实现中,以及在我的家庭滚动文件系统上,我只需选择填充一半的缓冲区......开玩笑。
我的观点是,即使我没有开玩笑,从技术上讲这也不会是一个错误。这是方法契约的问题。本例中的合同(文档)是:
即,它不保证填充缓冲区。
根据 API 实现,也许根据文件系统,
read
方法可能选择不填充缓冲区。这基本上是方法的契约所说的内容的问题。底线:它可能有效,但不能保证有效。
In my own API implementation, and on my home rolled file-system I simply choose to fill half the buffer...... just kidding.
My point is that even if I wasn't kidding, technically speaking it wouldn't be a bug. It is a matter of method contract. This is the contract (documentation) in this case is:
i.e., it gives no guarantees for filling the buffer.
Depending on the API implementation, and perhaps on the file-system the
read
method may choose not to fill the buffer. It's basically a question of what the contract of the method says.Bottom line: It probably works, but is not guaranteed to work.
例如,如果文件在文件系统上是碎片化的,并且低级实现知道它将必须等待 HD 寻找下一个碎片(相对于 CPU 操作来说,这需要花费大量时间),对于 read() 调用来说,返回部分缓冲区未填充的情况是有意义的,这样应用程序就有机会对收到的数据执行某些操作。
现在我不知道是否有任何实现实际上是这样工作的,但重点是您一定不能依赖于正在填充的缓冲区,因为它不受 API 契约的保证。
If, for example, the file is fragmented on the filesystem and the low-level implementation knows that it will have to wait for the HD to seek to the next fragment (which is something that takes a LOT of time relative to CPU operations), it would make sense for the
read()
call to return with part of the buffer unfilled to give the application the chance to already do something with the data it has recieved.Now I don't know whether any implementation actually works like that, but the point is that you must not rely on the buffer being filled, because it's not guaranteed by the API contract.
好吧,首先你给自己制造了一个错误的二分法。一种完全正常的情况是缓冲区不会被填充,因为文件中没有留下那么多字节。这不是
IOException
,但并不意味着整个文件的内容尚未被读取。规范称该方法将返回 -1 表示流结束,或者将阻塞直到至少读取一个字节。
InputStream
的实现者可以根据他们认为合适的方式进行优化(例如,无论调用者选择的缓冲区大小如何,一旦数据包进入,TCP 流就可能返回数据)。FileInputStream
可能会用一个块的数据填充缓冲区。作为调用者,您除了在方法返回-1
之前一无所知,您需要继续阅读。编辑
实际上,在您的示例中,我看到缓冲区不会被填充的唯一情况(使用标准实现)是如果文件在分配缓冲区之后但在开始读取它之前更改了大小。由于您尚未锁定文件,因此这是一种可能性。
Well, first off you've made yourself a false dichotomy. One perfectly normal circumstance is that the buffer won't be filled because there aren't that many bytes left in the file. That is not an
IOException
, but it doesn't mean the whole file's contents have not been read.The spec says the method will either return -1 indicating end-of-stream or will block until at least one byte is read. Implementers of
InputStream
can optimize as they see fit (e.g. a TCP stream might return data as soon as the packet comes in regardless of the caller's choice of buffer size). AFileInputStream
might fill the buffer with one block's worth of data. As the caller, you have no idea except that until the method returns-1
, you need to keep on reading.Edit
In practice, with your example, the only circumstance I would see where the buffer wouldn't be filled (with a standard implementation) is if the file changed size after you allocated the buffer but before you started reading it. Since you haven't locked the file down this is a possibility.