使用base64编码器和InputStreamReader的问题

发布于 2024-09-03 15:06:41 字数 627 浏览 9 评论 0原文

我的数据库中有一些 CLOB 列,我需要将 Base64 编码的二进制文件放入其中。 这些文件可能很大,所以我需要流式传输它们,我无法一次读取整个文件。

我正在使用 org.apache.commons.codec.binary.Base64InputStream 进行编码,但遇到了问题。我的代码本质上是这样

FileInputStream fis = new FileInputStream(file);
Base64InputStream b64is = new Base64InputStream(fis, true, -1, null);
BufferedReader reader = new BufferedReader(new InputStreamReader(b64is));

preparedStatement.setCharacterStream(1, reader);

当我运行上面的代码时,我在执行更新期间得到其中之一 java.io.IOException:底层输入流返回零字节,它被抛出到InputStreamReader代码深处。

为什么这不起作用?在我看来,reader 会尝试从 Base 64 流中读取,而该流会从文件流中读取,一切都应该很顺利。

I have some CLOB columns in a database that I need to put Base64 encoded binary files in.
These files can be large, so I need to stream them, I can't read the whole thing in at once.

I'm using org.apache.commons.codec.binary.Base64InputStream to do the encoding, and I'm running into a problem. My code is essentially this

FileInputStream fis = new FileInputStream(file);
Base64InputStream b64is = new Base64InputStream(fis, true, -1, null);
BufferedReader reader = new BufferedReader(new InputStreamReader(b64is));

preparedStatement.setCharacterStream(1, reader);

When I run the above code, I get one of these during the execution of the update
java.io.IOException: Underlying input stream returned zero bytes, it is thrown deep in the InputStreamReader code.

Why would this not work? It seems to me like the reader would attempt to read from the base 64 stream, which would read from the file stream, and everything should be happy.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

木落 2024-09-10 15:06:41

这似乎是 Base64InputStream 中的一个错误。你调用它是正确的。

您应该将此报告给 Apache commons 编解码器项目。

简单测试用例:

import java.io.*;
import org.apache.commons.codec.binary.Base64InputStream;

class tmp {
  public static void main(String[] args) throws IOException {
    FileInputStream fis = new FileInputStream(args[0]);
    Base64InputStream b64is = new Base64InputStream(fis, true, -1, null);

    while (true) {
      byte[] c = new byte[1024];
      int n = b64is.read(c);
      if (n < 0) break;
      if (n == 0) throw new IOException("returned 0!");
      for (int i = 0; i < n; i++) {
        System.out.print((char)c[i]);
      }
    }
  }
}

InputStreamread(byte[]) 调用不允许返回 0。它确实在任何长度为 3 字节倍数的文件上返回 0 。

This appears to be a bug in Base64InputStream. You're calling it correctly.

You should report this to the Apache commons codec project.

Simple test case:

import java.io.*;
import org.apache.commons.codec.binary.Base64InputStream;

class tmp {
  public static void main(String[] args) throws IOException {
    FileInputStream fis = new FileInputStream(args[0]);
    Base64InputStream b64is = new Base64InputStream(fis, true, -1, null);

    while (true) {
      byte[] c = new byte[1024];
      int n = b64is.read(c);
      if (n < 0) break;
      if (n == 0) throw new IOException("returned 0!");
      for (int i = 0; i < n; i++) {
        System.out.print((char)c[i]);
      }
    }
  }
}

the read(byte[]) call of InputStream is not allowed to return 0. It does return 0 on any file which is a multiple of 3 bytes long.

饮惑 2024-09-10 15:06:41

有趣的是,我在这里做了一些测试,当您使用 InputStreamReader 读取 Base64InputStream 时,无论流的来源如何,它确实会抛出该异常,但当您读取时它可以完美地工作它作为二进制流。正如 Trashgod 提到的,Base64 编码是有框架的。实际上,InputStreamReader 应该再次调用 Base64InputStream 上的 flush(),以查看它是否不再返回任何数据。

除了实现您自己的 Base64InputStreamReaderBase64Reader 之外,我没有看到其他方法可以解决此问题。 这实际上是一个错误,请参阅 Keith 的回答。

作为一种解决方法,您也可以将其存储在数据库中的 BLOB 而不是 CLOB 中,并使用 PreparedStatement#setBinaryStream()反而。是否存储为二进制数据并不重要。无论如何,您都不希望有如此大的 Base64 数据可索引或可搜索。


更新:因为这不是一个选项,并且让 Apache Commons Codec 人员修复 Base64InputStream 错误,我将其报告为 CODEC-101 可能需要一些时间,您可以考虑使用其他第 3 方 Base64 API。我在这里找到了一个(公共领域,所以你可以用它做任何事情你想要的,甚至放在你自己的包中),我在这里测试过它,它工作得很好。

InputStream base64 = new Base64.InputStream(input, Base64.ENCODE);

更新 2:commons 编解码器人员已修复了很快。

Index: src/java/org/apache/commons/codec/binary/Base64InputStream.java
===================================================================
--- src/java/org/apache/commons/codec/binary/Base64InputStream.java (revision 950817)
+++ src/java/org/apache/commons/codec/binary/Base64InputStream.java (working copy)
@@ -145,21 +145,41 @@
         } else if (len == 0) {
             return 0;
         } else {
-            if (!base64.hasData()) {
-                byte[] buf = new byte[doEncode ? 4096 : 8192];
-                int c = in.read(buf);
-                // A little optimization to avoid System.arraycopy()
-                // when possible.
-                if (c > 0 && b.length == len) {
-                    base64.setInitialBuffer(b, offset, len);
+            int readLen = 0;
+            /*
+             Rationale for while-loop on (readLen == 0):
+             -----
+             Base64.readResults() usually returns > 0 or EOF (-1).  In the
+             rare case where it returns 0, we just keep trying.
+
+             This is essentially an undocumented contract for InputStream
+             implementors that want their code to work properly with
+             java.io.InputStreamReader, since the latter hates it when
+             InputStream.read(byte[]) returns a zero.  Unfortunately our
+             readResults() call must return 0 if a large amount of the data
+             being decoded was non-base64, so this while-loop enables proper
+             interop with InputStreamReader for that scenario.
+             -----
+             This is a fix for CODEC-101
+            */
+            while (readLen == 0) {
+                if (!base64.hasData()) {
+                    byte[] buf = new byte[doEncode ? 4096 : 8192];
+                    int c = in.read(buf);
+                    // A little optimization to avoid System.arraycopy()
+                    // when possible.
+                    if (c > 0 && b.length == len) {
+                        base64.setInitialBuffer(b, offset, len);
+                    }
+                    if (doEncode) {
+                        base64.encode(buf, 0, c);
+                    } else {
+                        base64.decode(buf, 0, c);
+                    }
                 }
-                if (doEncode) {
-                    base64.encode(buf, 0, c);
-                } else {
-                    base64.decode(buf, 0, c);
-                }
+                readLen = base64.readResults(b, offset, len);
             }
-            return base64.readResults(b, offset, len);
+            return readLen;
         }
     }

我在这里尝试了一下,效果很好。

Interesting, I did some tests here and it indeed throws that exception when you read the Base64InputStream using an InputStreamReader, regardless the source of the stream, but it works flawlessly when you read it as binary stream. As Trashgod mentioned, Base64 encoding is framed. The InputStreamReader should in fact have invoked flush() on the Base64InputStream once more to see if it doesn't return any more data.

I don't see other ways to fix this than implementing your own Base64InputStreamReader or Base64Reader. This is actually a bug, see Keith's answer.

As a workaround you can also just store it in a BLOB instead of a CLOB in the DB and use PreparedStatement#setBinaryStream() instead. It doesn't matter if it's stored as binary data or not. You don't want to have such large Base64 data to be indexable or searchable anyway.


Update: since that's not an option and having the Apache Commons Codec guys to fix the Base64InputStream bug which I repored as CODEC-101 might take some time, you may consider to use another 3rd party Base64 API. I've found one here (public domain, so you can do whatever with it you want, even place in your own package), I've tested it here and it works fine.

InputStream base64 = new Base64.InputStream(input, Base64.ENCODE);

Update 2: the commons codec guy has fixed it pretty soon.

Index: src/java/org/apache/commons/codec/binary/Base64InputStream.java
===================================================================
--- src/java/org/apache/commons/codec/binary/Base64InputStream.java (revision 950817)
+++ src/java/org/apache/commons/codec/binary/Base64InputStream.java (working copy)
@@ -145,21 +145,41 @@
         } else if (len == 0) {
             return 0;
         } else {
-            if (!base64.hasData()) {
-                byte[] buf = new byte[doEncode ? 4096 : 8192];
-                int c = in.read(buf);
-                // A little optimization to avoid System.arraycopy()
-                // when possible.
-                if (c > 0 && b.length == len) {
-                    base64.setInitialBuffer(b, offset, len);
+            int readLen = 0;
+            /*
+             Rationale for while-loop on (readLen == 0):
+             -----
+             Base64.readResults() usually returns > 0 or EOF (-1).  In the
+             rare case where it returns 0, we just keep trying.
+
+             This is essentially an undocumented contract for InputStream
+             implementors that want their code to work properly with
+             java.io.InputStreamReader, since the latter hates it when
+             InputStream.read(byte[]) returns a zero.  Unfortunately our
+             readResults() call must return 0 if a large amount of the data
+             being decoded was non-base64, so this while-loop enables proper
+             interop with InputStreamReader for that scenario.
+             -----
+             This is a fix for CODEC-101
+            */
+            while (readLen == 0) {
+                if (!base64.hasData()) {
+                    byte[] buf = new byte[doEncode ? 4096 : 8192];
+                    int c = in.read(buf);
+                    // A little optimization to avoid System.arraycopy()
+                    // when possible.
+                    if (c > 0 && b.length == len) {
+                        base64.setInitialBuffer(b, offset, len);
+                    }
+                    if (doEncode) {
+                        base64.encode(buf, 0, c);
+                    } else {
+                        base64.decode(buf, 0, c);
+                    }
                 }
-                if (doEncode) {
-                    base64.encode(buf, 0, c);
-                } else {
-                    base64.decode(buf, 0, c);
-                }
+                readLen = base64.readResults(b, offset, len);
             }
-            return base64.readResults(b, offset, len);
+            return readLen;
         }
     }

I tried it here and it works fine.

情域 2024-09-10 15:06:41

“为了获得最高效率,请考虑包装 InputStreamReaderBufferedReader 例如:“

BufferedReader in = new BufferedReader(new InputStreamReader(b64is));

附录:作为 Base64已填充为 4 个字符的倍数,请验证源代码是否未被截断。可能需要flush()

"For top efficiency, consider wrapping an InputStreamReader within a BufferedReader. For example:"

BufferedReader in = new BufferedReader(new InputStreamReader(b64is));

Addendum: As Base64 is padded to a multiple of 4 characters, verify that the source isn't truncated. A flush() may be required.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文