FileInputStream.read(byte[]) 有什么问题?

发布于 2024-11-09 22:40:10 字数 614 浏览 7 评论 0原文

回应我对文件的回答-阅读问题,评论者表示FileInputStream.read(byte[])“不能保证填充缓冲区”。

File file = /* ... */  
long len = file.length();
byte[] buffer = new byte[(int)len];
FileInputStream in = new FileInputStream(file);
in.read(buffer);

(代码假定文件长度不超过 2GB)

除了 IOException 之外,还有什么可能导致 read 方法无法检索整个文件内容?

编辑:

代码的想法(以及我回答的问题的OP的目标)是将整个文件一次性读入一大块内存,这就是为什么buffer_size = 文件大小

In response to my answer to a file-reading question, a commenter stated that FileInputStream.read(byte[]) is "not guaranteed to fill the buffer."

File file = /* ... */  
long len = file.length();
byte[] buffer = new byte[(int)len];
FileInputStream in = new FileInputStream(file);
in.read(buffer);

(The code assumes that the file length does not exceed 2GB)

Apart from an IOException, what could cause the read method to not retrieve the entire file contents?

EDIT:

The idea of the code (and the goal of the OP of the question I answered) is to read the entire file into a chunk of memory in one swoop, that's why buffer_size = file_size.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

万人眼中万个我 2024-11-16 22:40:11

人们已经讨论过在 FileInputStream 上读取假设没有填充缓冲区。事实上,在某些情况下这是现实:

上述内容适用于 Linux,但其他操作系统和/或 Java 实现也可能存在类似情况。最重要的是,javadocs允许这种行为,如果您的应用程序认为它不会发生,您可能会遇到麻烦。

有一些第三方库实现了“完全阅读”行为;例如 Apache commons 提供 FileUtils.readFileToByteArrayIOUtils。 toByteArray 和类似的方法。如果您想要/需要这种行为,您应该使用这些库之一,或者自己实现它。

People have talked about read on a FileInputStream as hypothetically not filling the buffer. In fact it is a reality in some circumstances:

  • If you open a FileInputStream on a "/dev/tty" or a named pipe, then a read will only return you the data that is currently available. Other device files may behave the same way. (These files will probably return 0L as the file size though.)

  • A FUSE file system can be implemented to not completely fill the read buffer if the file system has been mounted with the direct_io option, or a file is opened with the corresponding flag.

The above apply to Linux, but there could well be similar cases for other operating systems and/or Java implementations. The bottom line is that the javadocs allow this behavior and you can get into trouble if your application assumes that it won't occur.

There are 3rd party libraries that implement "read fully" behavior; e.g. Apache commons provides FileUtils.readFileToByteArray or IOUtils.toByteArray and similar methods. If you want / need that behavior you should use one of those libraries, or implement it yourself.

说好的呢 2024-11-16 22:40:11

不保证填充缓冲区。

文件大小可能小于缓冲区,或者文件的其余部分可能小于缓冲区。

It's not guaranteed to Fill the buffer.

The file size may be smaller than the buffer, or the remainder of the file may be smaller than the buffer.

又怨 2024-11-16 22:40:11

你的问题是自相矛盾的。无法保证它会读取整个缓冲区,即使在任何可以想象的情况下它都不会读取。没有任何保证,所以你不能假设它。

Your question is self-contradictory. There is no guarantee that it will read the whole buffer, even if there are no imaginable circumstances in which it won't. There is no guarantee so you can't assume it.

夜还是长夜 2024-11-16 22:40:10

除了 IOException 之外,还有什么可能导致读取方法无法检索整个文件内容?

在我自己的 API 实现中,以及在我的家庭滚动文件系统上,我只需选择填充一半的缓冲区......开玩笑。

我的观点是,即使我没有开玩笑,从技术上讲这也不会是一个错误。这是方法契约的问题。本例中的合同(文档)是:

从此输入流中读取最多 b.length 字节的数据到字节数组中。

即,它不保证填充缓冲区。

根据 API 实现,也许根据文件系统,read 方法可能选择不填充缓冲区。这基本上是方法的契约所说的内容的问题。


底线:可能有效,但不能保证有效。

Apart from an IOException, what could cause the read method to not retrieve the entire file contents?

In my own API implementation, and on my home rolled file-system I simply choose to fill half the buffer...... just kidding.

My point is that even if I wasn't kidding, technically speaking it wouldn't be a bug. It is a matter of method contract. This is the contract (documentation) in this case is:

Reads up to b.length bytes of data from this input stream into an array of bytes.

i.e., it gives no guarantees for filling the buffer.

Depending on the API implementation, and perhaps on the file-system the read method may choose not to fill the buffer. It's basically a question of what the contract of the method says.


Bottom line: It probably works, but is not guaranteed to work.

不顾 2024-11-16 22:40:10

什么可能导致读取方法
不检索整个文件内容?

例如,如果文件在文件系统上是碎片化的,并且低级实现知道它将必须等待 HD 寻找下一个碎片(相对于 CPU 操作来说,这需要花费大量时间),对于 read() 调用来说,返回部分缓冲区未填充的情况是有意义的,这样应用程序就有机会对收到的数据执行某些操作。

现在我不知道是否有任何实现实际上是这样工作的,但重点是您一定不能依赖于正在填充的缓冲区,因为它不受 API 契约的保证。

what could cause the read method to
not retrieve the entire file contents?

If, for example, the file is fragmented on the filesystem and the low-level implementation knows that it will have to wait for the HD to seek to the next fragment (which is something that takes a LOT of time relative to CPU operations), it would make sense for the read() call to return with part of the buffer unfilled to give the application the chance to already do something with the data it has recieved.

Now I don't know whether any implementation actually works like that, but the point is that you must not rely on the buffer being filled, because it's not guaranteed by the API contract.

夜未央樱花落 2024-11-16 22:40:10

好吧,首先你给自己制造了一个错误的二分法。一种完全正常的情况是缓冲区不会被填充,因为文件中没有留下那么多字节。这不是 IOException,但并不意味着整个文件的内容尚未被读取。

规范称该方法将返回 -1 表示流结束,或者将阻塞直到至少读取一个字节。 InputStream 的实现者可以根据他们认为合适的方式进行优化(例如,无论调用者选择的缓冲区大小如何,一旦数据包进入,TCP 流就可能返回数据)。 FileInputStream 可能会用一个块的数据填充缓冲区。作为调用者,您除了在方法返回 -1 之前一无所知,您需要继续阅读。

编辑

实际上,在您的示例中,我看到缓冲区不会被填充的唯一情况(使用标准实现)是如果文件在分配缓冲区之后但在开始读取它之前更改了大小。由于您尚未锁定文件,因此这是一种可能性。

Well, first off you've made yourself a false dichotomy. One perfectly normal circumstance is that the buffer won't be filled because there aren't that many bytes left in the file. That is not an IOException, but it doesn't mean the whole file's contents have not been read.

The spec says the method will either return -1 indicating end-of-stream or will block until at least one byte is read. Implementers of InputStream can optimize as they see fit (e.g. a TCP stream might return data as soon as the packet comes in regardless of the caller's choice of buffer size). A FileInputStream might fill the buffer with one block's worth of data. As the caller, you have no idea except that until the method returns -1, you need to keep on reading.

Edit

In practice, with your example, the only circumstance I would see where the buffer wouldn't be filled (with a standard implementation) is if the file changed size after you allocated the buffer but before you started reading it. Since you haven't locked the file down this is a possibility.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文