ByteArrayOutputStream 公共接口的设计理由是什么?

发布于 2024-11-15 08:55:35 字数 1327 浏览 2 评论 0原文

有许多 Java 标准库和第 3 方库在其公共 API 中提供了写入或读取 Stream 的方法。 一个示例是 javax.imageio.ImageIO.write(),它使用 OutputStream 将已处理图像的内容写入其中。 另一个例子是 iText pdf 处理库,它使用 OutputStream 将生成的 pdf 写入其中。 第三个示例是 AmazonS3 Java API,它采用 InputStream,以便读取它并在其 S3 存储中创建文件。

当您想将其中两个结合起来时,问题就会出现。例如,我有一个 BufferedImage 图像,我必须使用 ImageIO.write 将结果推送到 OutputStream 中。 但没有直接方式将其推送到 Amazon S3,因为 S3 需要 InputStream
解决这个问题的方法很少,但这个问题的主题是 ByteArrayOutputStream 的使用。

ByteArrayOutputStream 背后的想法是使用包装在输入/输出流中的中间字节数组,以便想要写入输出流的人将写入该数组,而该人想要读取,将读取数组。

我想知道为什么 ByteArrayOutputStream 不允许在不复制字节数组的情况下对其进行任何访问,例如提供可以直接访问它的 InputStream 。 访问它的唯一方法是调用 toByteArray(),这将复制内部数组(标准数组)。这意味着,在我的图像示例中,我将在内存中拥有图像的三个副本:

  • 第一个是实际的BufferedImage
  • 第二个OutputStream的内部array
  • 第三个​​toByteArray()生成的副本所以我可以创建 输入流。

这个设计的合理性如何?

  • 隐藏实现?只需提供 getInputStream(),实现就会保持隐藏状态。
  • 多线程?无论如何,ByteArrayOutputStream 都不适合多线程访问,所以这是不可能的。

此外,还有第二种类型的 ByteArrayOutputStream,由 Apache 的 commons-io 库(具有不同的内部实现)提供。 但两者都具有完全相同的 public 接口,该接口不提供在不复制字节数组的情况下访问字节数组的方法。

There are many java standard and 3rd party libraries that in their public API, there are methods for writing to or reading from Stream.
One example is javax.imageio.ImageIO.write() that takes OutputStream to write the content of a processed image to it.
Another example is iText pdf processing library that takes OutputStream to write the resulting pdf to it.
Third example is AmazonS3 Java API, which takes InputStream so that will read it and create file in thir S3 storage.

The problem araises when you want to to combine two of these. For example, I have an image as BufferedImage for which i have to use ImageIO.write to push the result in OutputStream.
But there is no direct way to push it to Amazon S3, as S3 requires InputStream.
There are few ways to work this out, but subject of this question is usage of ByteArrayOutputStream.

The idea behind ByteArrayOutputStream is to use an intermidiate byte array wrapped in Input/Output Stream so that the guy that wants to write to output stream will write to the array and the guy that wants to read, will read the array.

My wondering is why ByteArrayOutputStream does not allow any access to the byte array without copying it, for example, to provide an InputStream that has direct access to it.
The only way to access it is to call toByteArray(), that will make a copy of the internal array (the standard one). Which means, in my image example, i will have three copies of the image in the memory:

  • First is the actual BufferedImage,
  • second is the internal array of the OutputStream and
  • third is the copy produced by toByteArray() so I can create the
    InputStream.

How this design is justified?

  • Hiding implementation? Just provide getInputStream(), and the implementation stays hidden.
  • Multi-threading? ByteArrayOutputStream is not suited for access by multiple threads anyway, so this can not be.

Moreover, there is second flavor of ByteArrayOutputStream, provided by Apache's commons-io library (which has a different internal implementation).
But both have exactly the same public interface that does not provide way to access the byte array without copying it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

千纸鹤带着心事 2024-11-22 08:55:35

我想知道为什么 ByteArrayOutputStream 不允许在不处理字节数组的情况下对它进行任何访问,例如,提供可以直接访问它的 InputStream 。

我可以想到四个原因:

  • 当前的实现使用单个字节数组,但它也可以实现为字节数组的链接列表,推迟最终数组的创建,直到应用程序要求它。如果应用程序可以看到实际的字节缓冲区,那么它将成为单个数组。

  • 与您的理解相反,ByteArrayOutputStream是线程安全的,并且适合在多线程应用程序中使用。但是,如果提供对字节数组的直接访问,则很难看出如何在不产生其他问题的情况下进行同步。

  • API 需要更加复杂,因为应用程序还需要知道当前缓冲区高水位线在哪里,以及字节数组是否(仍然)是活动字节数组。 (ByteArrayOutputStream 实现有时需要重新分配字节数组……这将使应用程序持有对不再是数组的引用。) p>

  • 当您公开字节数组时,您允许应用程序修改该数组的内容,这可能有问题。


这个设计的合理性如何?

该设计专为比您更简单的用例而定制。 Java SE 类库并不旨在支持所有可能的用例。但它们不会阻止您(或第三方库)为其他用例提供其他流类。


最重要的是,Sun 设计者决定不公开 ByteArrayOutputStream 的字节数组,并且(IMO)您不太可能改变他们的想法。

(如果你想尝试,这不是正确的地方。

  • 尝试通过 Bugs 数据库提交 RFE。
  • 或者开发一个添加功能的补丁并通过相关渠道将其提交给 OpenJDK 团队。你会增加如果您包含全面的单元测试和文档,您的机会就很大。)

您可能会更成功地让 Apache Commons IO 开发人员相信您的论点的正确性,前提是您可以提出一个不太危险的 API 设计。

或者,没有什么可以阻止您实现自己的公开其内部数据结构的专用版本。该代码是 GPL 的,因此您可以复制它......遵守有关代码分发的正常 GPL 规则。

My wondering is why ByteArrayOutputStream does not allow any access to the byte array without coping it, for example, to provide an InputStream that has direct access to it.

I can think of four reasons:

  • The current implementation uses a single byte array, but it could also be implemented as a linked list of byte arrays, deferring the creation of the final array until the application asks for it. If the application could see the actual byte buffer, it would have to be a single array.

  • Contrary to your understanding ByteArrayOutputStream is thread safe, and is suitable for use in multi-threaded applications. But if direct access was provided to the byte array, it is difficult to see how that could be synchronized without creating other problems.

  • The API would need to be more complicated because the application also needs to know where the current buffer high water mark is, and whether the byte array is (still) the live byte array. (The ByteArrayOutputStream implementation occasionally needs to reallocate the byte array ... and that will leave the application holding a reference to an array that is no longer the array.)

  • When you expose the byte array, you allow an application to modify the contents of the array, which could be problematic.


How this design is justified?

The design is tailored for simpler use-cases than yours. The Java SE class libraries don't aim to support all possible use-cases. But they don't prevent you (or a 3rd party library) from providing other stream classes for other use-cases.


The bottom line is that the Sun designers decided NOT to expose the byte array for ByteArrayOutputStream, and (IMO) you are unlikely to change their minds.

(And if you want to try, this is not the right place to do it.

  • Try submitting an RFE via the Bugs database.
  • Or develop an patch that adds the functionality and submit it to the OpenJDK team via the relevant channels. You would increase your chances if you included comprehensive unit tests and documentation.)

You might have more success convincing the Apache Commons IO developers of the rightness of your arguments, provided that you can come up with an API design that isn't too dangerous.

Alternatively, there's nothing stopping you from just implementing your own special purpose version that exposes its internal data structures. The code is GPL'ed so you can copy it ... subject to the normal GPL rules about code distribution.

魔法少女 2024-11-22 08:55:35

幸运的是,内部数组是受保护的,因此您可以对其进行子类化,并在其周围包装一个 ByteArrayInputStream ,而无需进行任何复制。

Luckily, the internal array is protected, so you can subclass it, and wrap a ByteArrayInputStream around it, without any copying.

×纯※雪 2024-11-22 08:55:35

我认为您正在寻找的行为是

I think that the behavior you are looking for is a Pipe. A ByteArrayOutputStream is just an OutputStream, not an input/output stream. It wasn't designed for what you have in mind.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文