为什么 java.io.Reader#skip 是这样实现的?

发布于 2024-11-13 05:40:38 字数 741 浏览 5 评论 0原文

我仍在学习 Java 中的面向对象编程。我正在研究 java.io.Reader.skip 的 Java 实现,我想知道为什么它到底是这样实现的。特别是我对我注意到的这些事情有疑问:

  1. 用于 skip(long) 的缓冲区是 Reader 对象的一个​​字段,而不是方法中的普通变量。
  2. 最大缓冲区长度远小于 Integer.MAX_VALUE 2147,483,647。特别是,Java 的实现使用 8192。
  3. java.io.InputStream 以完全相同的方式实现跳过。

现在,我个人认为缓冲区是一个字段的原因是,缓冲区不必由于重复重新初始化而重复进行垃圾收集。这可能会使跳过速度更快。

我认为缓冲区长度较小与读取器阻塞时间较短有关,但由于读取器是同步的,这真的会产生影响吗?

字节流以相同的方式实现它,可能是为了一致性。我对这三件事的假设正确吗?

总而言之,我的问题是:对于字符数组使用字段而不是变量,平均速度有多大差异?使用 Integer.MAX_VALUE 作为最大缓冲区长度不是一样吗?在字节流的 for 循环中使用无参数 read 方法不是更好、更容易吗,因为其他 read 方法只调用无参数 <代码>读取?

抱歉,如果我的问题是一个奇怪的问题,但我认为通过这个问题我可以学到很多关于面向对象编程的知识。

I'm still learning object-oriented programming in Java. I was looking at the Java implementation of java.io.Reader.skip and I'm wondering why exactly it's implemented the way that it is. In particular I have questions about these things that I have noticed:

  1. The buffer used for the skip(long) is a field of the Reader object, rather than a normal variable in the method.
  2. The maximum buffer length is much less than Integer.MAX_VALUE 2147,483,647. In particular, Java's implementation uses 8192.
  3. java.io.InputStream implements skip the same exact way.

Now, the reasons why I personally think that the buffer is a field, is so that the buffer won't have to be garbage collected repetitively due to being reinitialised repetitively. This might make skipping faster.

The buffer length being smaller I think has to do with making it so that the Reader blocks for shorter periods, but since the Reader is synchronized, would that really make a difference?

Byte streams implementing it the same way, might be for consistency. Are my assumptions correct on these three things?

To summarise, my questions are: About how much of a difference in speed on average does it make to use a field rather than a variable for character arrays? Wouldn't it be just the same to use Integer.MAX_VALUE as the maximum buffer length? And isn't it better and easier to use the no-parameter read method in a for-loop for byte streams since the other read methods just call the no-parameter read?

Sorry if my question's a strange question, but I think that I can learn a lot about object-oriented programming through this question.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

情域 2024-11-20 05:40:38

一次读取一个字符的效率会低得多 - 每个字节都会被跳过一个方法调用,这对于大的跳过(大量的开销)通常是不利的。

临时缓冲区大小很容易回答:如果您要从文件中跳过 2G,您是否真的想要分配 RAMInteger.MAX_VALUE 块?

至于确切的大小,以及是否使用实例变量,这是一个依赖于实现的折衷方案。您正在阅读一个选择 8192 成员的实现。一些实现具有较小的本地实现(512 可以在此处)。

标准中没有任何内容需要任何这些实现​​细节,因此根本不要依赖它们。

如果您计划做类似的事情,请对不同的方法进行基准测试,并根据您的具体情况选择最佳的折衷方案。

Reading one char at a time would be much less efficient - you'd have one method call per byte skipped, which is usually bad for large skips (a lot of overhead).

The scratch buffer size is simple to answer: would you really want to allocate an Integer.MAX_VALUE chunk of RAM if you're going to skip 2G from a file?

As for the exact size, and whether or not to use an instance varialbe, that's an implementation-dependent compromise. You're reading an implementation that chose 8192 member. Some implementations have smaller, local ones (512 can be seen here).

Nothing in the standard requires any of these implementation details, so don't rely on them at all.

If you're planning on doing something similar, benchmark the different approaches and pick the best compromise in your specific circumstances.

世态炎凉 2024-11-20 05:40:38

对于字符数组使用字段而不是变量,平均速度有多少差异?

这肯定会因 JVM 的不同而有所不同,但重复分配 8K 数组可能并不像保留一个数组那么便宜。当然,这里隐藏的教训是,人们不应该保留读者,即使是封闭的读者,因为它们会受到 8K 的惩罚。

使用 Integer.MAX_VALUE 作为最大缓冲区长度不是一样吗?

缓冲区必须预先分配,分配 2Gb 的阵列似乎有些过头了。请记住,分页的原因是为了分摊读取调用的成本——这有时会变成本机操作。

在字节流的for循环中使用无参数读取方法不是更好更容易吗,因为其他读取方法只是调用无参数读取方法?

不能保证底层流被缓冲,因此这可能会产生大量的每次调用开销。

最后,请记住 java.io 类有很多很多缺陷,因此不要假设所有内容都有充分的理由。

About how much of a difference in speed on average does it make to use a field rather than a variable for character arrays?

This would definitely vary from JVM to JVM, but repeatedly allocating a 8K array is probably not as cheap as keeping one around. Of course, the hidden lesson here is that one should not hold onto readers, even closed ones, because they carry an 8K penalty.

Wouldn't it be just the same to use Integer.MAX_VALUE as the maximum buffer length?

The buffer has to get pre-allocated, and allocating a 2Gb array seems like an overkill. Remember, the reason for paging is to amortize the cost of the read call -- which sometimes turns into native operations.

Isn't it better and easier to use the no-parameter read method in a for-loop for byte streams since the other read methods just call the no-parameter read?

It is not guaranteed that the underlying stream is buffered, so this may incur heavy per-call overhead.

Finally, keep in mind that the java.io classes have many, many deficiencies, so don't assume that everything there is there with good reasons.

北方的巷 2024-11-20 05:40:38

对于InputStream,您通常拥有允许更有效跳过的子类,并且这些子类适当地覆盖skip 方法。但对于那些没有有效跳过方式的子类(例如压缩或解压缩输入流),skip 方法是基于读取实现的,因此并非每个子类都必须这样做。

关于如何在 java.io 包中实现此功能,有多种策略:

跳过基本流:

  • FilterInputStream.skip() 只是委托给源流。不过,我不太确定这有多大用处。

  • DataInputStream 不会覆盖 skip(),但有另一个名为 skipBytes() 的方法,它可以执行相同的操作(仅适用于 不过, >int 参数)。它委托给底层源流。

  • BufferedInputStream.skip() 会覆盖它,首先跳过其自己的缓冲区中的现有内容,然后在基本流上调用 skip() (如果没有 < code>mark() set - 如果有标记,则必须将所有内容读入缓冲区以支持 reset())。

  • PushbackInputStream.skip() 首先跳过其回推缓冲区,然后调用 super.skip() (即 FilterInputStream.skip()) code>,见上文)。

重置索引:

  • ByteArrayInputStream 可以简单地支持跳过,只需设置下一个读取的位置。

  • StringBufferInputStream(这是 StringReader 的弃用版本)仅通过重置索引即可支持跳过。

Native Magic:

  • FileInputStreamskip() 作为native 方法。我认为这将是最有用的典型示例。

读取所有内容并将其扔掉:

  • LineNumberInputStream.skip() 必须读取所有内容才能计算行数。 (我不知道这个类存在。请使用 LineNumberReader。)

  • ObjectInputStream 不会覆盖 skip(),但有另一个名为 skipBytes() 的方法执行相同的操作(但仅适用于 int 参数)。它委托给一个内部类 (BlockDataInputStream.skip()),该内部类又从底层流中读取数据,遵守块数据的对象流协议。

InputStream 中的默认实现:

  • SequenceInputStreamPipedInputStream 也使用它。

让我们看一下 Reader 类。原则上,适用相同的策略:

跳过基本读取器/流:

  • FilterReader.skip() 执行此操作。

  • PushBackReader 首先跳过其自己的推回缓冲区,然后是基本读取器。

重置一些索引:

  • StringReader(这个实际上支持向后跳过)

  • CharArrayReader

读取所有内容并将其丢弃:

  • 默认的 Reader.skip()PipedReader 也使用它.

  • 对于InputStreamReader“简单地跳过基本流”方法仅适用于固定字节计数字符集(即ISO-8859 系列、UTF-16 和一些类似的字符集),不适用于 UTF-8UTF-32 或其他具有可变数量的字符集每个字符字节,因为实际上我们必须读取所有字节才能知道它们代表多少个字符。这也适用于其子类FileReader

  • BufferedReader(它不会调用自己的read(),而是填充其内部缓冲区,该缓冲区从基本流中读取)。

  • LineNumberReader(它必须执行此操作以跟踪行号)

For InputStream, you often have subclasses which allow much more efficient skipping, and these override the skip method appropriately. But for those subclasses which do not have an efficient way of skipping (like a compressing or decompressing input stream), the skip method is implemented based on reading, so not every subclass has to do the same.

There are several strategies on how to implement this in the java.io package:

Skipping the Base Stream:

  • FilterInputStream.skip() simply delegates to the source stream. I'm not so sure how useful this is, though.

  • DataInputStream does not override skip(), but has another method named skipBytes() which does the same thing (only for int arguments, though). It delegates to the underlying source stream.

  • BufferedInputStream.skip() overrides this, skipping first the existing contents in its own buffer, then calling skip() on the base stream (if there is no mark() set - if there is a mark, it has to read everything into the buffer to support reset()).

  • PushbackInputStream.skip() skips first over its pushback buffer, and then calls super.skip() (which is FilterInputStream.skip(), see above).

Resetting an Index:

  • ByteArrayInputStream can trivially support skipping, simply by setting the position where to read next.

  • StringBufferInputStream (which is a deprecated version of StringReader) supports skipping simply by resetting the index.

Native Magic:

  • FileInputStream has skip() as a native method. I think this would be the canonical example where it is most useful.

Read Everything and Throw it Away:

  • LineNumberInputStream.skip() has to read everything to count the lines. (I did not know that this class existed. Use LineNumberReader instead.)

  • ObjectInputStream does not override skip(), but has another method named skipBytes() which does the same thing (only for int arguments, though). It delegates to an inner class (BlockDataInputStream.skip()), which in turn reads from the underlying stream, respecting the Object stream protocol for block data.

Default implementation in InputStream:

  • This is also used by SequenceInputStream and PipedInputStream.

Let's have a look at the Reader classes. In principle, the same strategies apply:

Skip the Base Reader/Stream:

  • FilterReader.skip() does this.

  • PushBackReader first skips its own pushback buffer, then the base reader.

Reset Some Index:

  • StringReader (this one actually supports backwards skipping)

  • CharArrayReader

Read Everything and Throw it Away:

  • The default Reader.skip(), which is also used by PipedReader.

  • For InputStreamReader the "simply skip the base stream" approach only would work for fixed-byte-count charsets (i.e. the ISO-8859 series, UTF-16 and some similar ones), not for UTF-8, UTF-32 or other charsets with a variable number of bytes per character, since we would have to read all bytes to know how many characters they are representing, in fact. This also applies to its subclass FileReader.

  • BufferedReader (it does not call its own read(), but fills its internal buffer, which reads from the base stream).

  • LineNumberReader (it has to do this to keep track of the line numbers)

山有枢 2024-11-20 05:40:38

你忘记了 2^31 - 1 的缓冲区是必须分配的 2 GB 内存,然后不能用于其他任何事情,

分配 2 GB 的大连续字节块对于读取来说是过度的以字节为单位,可能会导致内存不足的情况,

最大 8 kB 的内存缓冲区是更好的选择和更好的权衡,因为它只会被分配一次(并且将在每个跳过操作中重用)

顺便说一句在 java.io.InputStreamskipbuff 是静态的,只分配一次,但由于没有从中读取(它只是用作只写内存),因此无需担心竞争

you are forgetting that a buffer of 2^31 - 1 is 2 GB of memory that has to be allocated that then cannot be used for anything else

allocating a large contiguous byte block of 2 gigabytes is overkill for reading in bytes and it could cause out of memory situations

a maximum memory buffer of 8 kB is much better alternative and a better trade-off as it will only be allocated once (and it will be reused on each skip operation)

btw in java.io.InputStream the skipbuff is static and only ever allocated once but as there are no reads from it (it's just used as a write-only memory) there is no need to worry about races

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文