base64 解码的文件不等于原始未编码的文件

发布于 2024-12-28 23:55:13 字数 1884 浏览 5 评论 0原文

我有一个普通的pdf文件 A.pdf ,第三方以base64对该文件进行编码,并将其作为长字符串在网络服务中发送给我(我无法控制第三方)。

我的问题是,当我使用 java org.apache.commons.codec.binary.Base64 解码字符串并将输出正确到名为 B.pdf 的文件时 我希望 B.pdf 与 A.pdf 相同,但 B.pdf 结果与 A.pdf 略有不同。因此,Acrobat 无法将 B.pdf 识别为有效的 pdf。

base64 是否有不同类型的编码\字符集机制?我可以检测我收到的字符串是如何编码的,以便 B.pdf=A.pdf 吗?

编辑-这是我想要解码的文件,解码后应以 pdf 格式打开

我的编码文件< /a>


这是在记事本++中打开的文件的标题,

**A.pdf**
        %PDF-1.4
        %±²³´
        %Created by Wnv/EP PDF Tools v6.1
        1 0 obj
        <<
        /PageMode /UseNone
        /ViewerPreferences 2 0 R
        /Type /Catalog

  **B.pdf**
        %PDF-1.4
        %±²³´
        %Created by Wnv/EP PDF Tools v6.1
        1 0! bj
        <<
        /PageMode /UseNone
        /ViewerPreferences 2 0 R
        /]
        pe /Catalog

这就是我解码字符串的方式

private static void decodeStringToFile(String encodedInputStr,
            String outputFileName) throws IOException {
        BufferedReader in = null;
        BufferedOutputStream out = null;
        try {
            in = new BufferedReader(new StringReader(encodedInputStr));
        out = new BufferedOutputStream(new FileOutputStream(outputFileName));
            decodeStream(in, out);
            out.flush();
        } finally {
            if (in != null)
                in.close();
            if (out != null)
                out.close();
        }
    }

    private static void decodeStream(BufferedReader in, OutputStream out)
            throws IOException {
        while (true) {
            String s = in.readLine();
            if (s == null)
                break;
            //System.out.println(s);
            byte[] buf = Base64.decodeBase64(s);
            out.write(buf);
        }

    }

I have a normal pdf file A.pdf , a third party encodes the file in base64 and sends it to me in a webservice as a long string (i have no control on the third party).

My problem is that when i decode the string with java org.apache.commons.codec.binary.Base64 and right the output to a file called B.pdf
I expect B.pdf to be identical to A.pdf, but B.pdf turns out a little different then A.pdf. As a result B.pdf is not recognized as a valid pdf by acrobat.

Does base64 have different types of encoding\charset mechanisms? can i detect how the string I received is encoded so that B.pdf=A.pdf ?

EDIT- this is the file I want to decode, after decoding it should open as a pdf

my encoded file


this is the header of the files opened in notepad++

**A.pdf**
        %PDF-1.4
        %±²³´
        %Created by Wnv/EP PDF Tools v6.1
        1 0 obj
        <<
        /PageMode /UseNone
        /ViewerPreferences 2 0 R
        /Type /Catalog

  **B.pdf**
        %PDF-1.4
        %±²³´
        %Created by Wnv/EP PDF Tools v6.1
        1 0! bj
        <<
        /PageMode /UseNone
        /ViewerPreferences 2 0 R
        /]
        pe /Catalog

this is how I decode the string

private static void decodeStringToFile(String encodedInputStr,
            String outputFileName) throws IOException {
        BufferedReader in = null;
        BufferedOutputStream out = null;
        try {
            in = new BufferedReader(new StringReader(encodedInputStr));
        out = new BufferedOutputStream(new FileOutputStream(outputFileName));
            decodeStream(in, out);
            out.flush();
        } finally {
            if (in != null)
                in.close();
            if (out != null)
                out.close();
        }
    }

    private static void decodeStream(BufferedReader in, OutputStream out)
            throws IOException {
        while (true) {
            String s = in.readLine();
            if (s == null)
                break;
            //System.out.println(s);
            byte[] buf = Base64.decodeBase64(s);
            out.write(buf);
        }

    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

阳光下慵懒的猫 2025-01-04 23:55:13
  1. 您正在通过逐行工作来破坏解码。 Base64 解码器只是忽略空格,这意味着原始内容中的一个字节很可能被分解为两行 Base64 文本。您应该将所有行连接在一起并一次性解码文件。

  2. Base64 类方法提供内容时,优先使用 byte[] 而不是 StringString 意味着字符集编码,这可能不会达到您想要的效果。

  1. You are breaking your decoding by working line-by-line. Base64 decoders simply ignore whitespace, which means that a byte in the original content could very well be broken into two Base64 text lines. You should concatenate all the lines together and decode the file in one go.

  2. Prefer using byte[] rather than String when supplying content to the Base64 class methods. String implies character set encoding, which may not do what you want.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文