base64 解码的文件不等于原始未编码的文件
我有一个普通的pdf文件 A.pdf ,第三方以base64对该文件进行编码,并将其作为长字符串在网络服务中发送给我(我无法控制第三方)。
我的问题是,当我使用 java org.apache.commons.codec.binary.Base64 解码字符串并将输出正确到名为 B.pdf 的文件时 我希望 B.pdf 与 A.pdf 相同,但 B.pdf 结果与 A.pdf 略有不同。因此,Acrobat 无法将 B.pdf 识别为有效的 pdf。
base64 是否有不同类型的编码\字符集机制?我可以检测我收到的字符串是如何编码的,以便 B.pdf=A.pdf 吗?
编辑-这是我想要解码的文件,解码后应以 pdf 格式打开
这是在记事本++中打开的文件的标题,
**A.pdf**
%PDF-1.4
%±²³´
%Created by Wnv/EP PDF Tools v6.1
1 0 obj
<<
/PageMode /UseNone
/ViewerPreferences 2 0 R
/Type /Catalog
**B.pdf**
%PDF-1.4
%±²³´
%Created by Wnv/EP PDF Tools v6.1
1 0! bj
<<
/PageMode /UseNone
/ViewerPreferences 2 0 R
/]
pe /Catalog
这就是我解码字符串的方式
private static void decodeStringToFile(String encodedInputStr,
String outputFileName) throws IOException {
BufferedReader in = null;
BufferedOutputStream out = null;
try {
in = new BufferedReader(new StringReader(encodedInputStr));
out = new BufferedOutputStream(new FileOutputStream(outputFileName));
decodeStream(in, out);
out.flush();
} finally {
if (in != null)
in.close();
if (out != null)
out.close();
}
}
private static void decodeStream(BufferedReader in, OutputStream out)
throws IOException {
while (true) {
String s = in.readLine();
if (s == null)
break;
//System.out.println(s);
byte[] buf = Base64.decodeBase64(s);
out.write(buf);
}
}
I have a normal pdf file A.pdf , a third party encodes the file in base64 and sends it to me in a webservice as a long string (i have no control on the third party).
My problem is that when i decode the string with java org.apache.commons.codec.binary.Base64 and right the output to a file called B.pdf
I expect B.pdf to be identical to A.pdf, but B.pdf turns out a little different then A.pdf. As a result B.pdf is not recognized as a valid pdf by acrobat.
Does base64 have different types of encoding\charset mechanisms? can i detect how the string I received is encoded so that B.pdf=A.pdf ?
EDIT- this is the file I want to decode, after decoding it should open as a pdf
this is the header of the files opened in notepad++
**A.pdf**
%PDF-1.4
%±²³´
%Created by Wnv/EP PDF Tools v6.1
1 0 obj
<<
/PageMode /UseNone
/ViewerPreferences 2 0 R
/Type /Catalog
**B.pdf**
%PDF-1.4
%±²³´
%Created by Wnv/EP PDF Tools v6.1
1 0! bj
<<
/PageMode /UseNone
/ViewerPreferences 2 0 R
/]
pe /Catalog
this is how I decode the string
private static void decodeStringToFile(String encodedInputStr,
String outputFileName) throws IOException {
BufferedReader in = null;
BufferedOutputStream out = null;
try {
in = new BufferedReader(new StringReader(encodedInputStr));
out = new BufferedOutputStream(new FileOutputStream(outputFileName));
decodeStream(in, out);
out.flush();
} finally {
if (in != null)
in.close();
if (out != null)
out.close();
}
}
private static void decodeStream(BufferedReader in, OutputStream out)
throws IOException {
while (true) {
String s = in.readLine();
if (s == null)
break;
//System.out.println(s);
byte[] buf = Base64.decodeBase64(s);
out.write(buf);
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您正在通过逐行工作来破坏解码。 Base64 解码器只是忽略空格,这意味着原始内容中的一个字节很可能被分解为两行 Base64 文本。您应该将所有行连接在一起并一次性解码文件。
向
Base64
类方法提供内容时,优先使用byte[]
而不是String
。String
意味着字符集编码,这可能不会达到您想要的效果。You are breaking your decoding by working line-by-line. Base64 decoders simply ignore whitespace, which means that a byte in the original content could very well be broken into two Base64 text lines. You should concatenate all the lines together and decode the file in one go.
Prefer using
byte[]
rather thanString
when supplying content to theBase64
class methods.String
implies character set encoding, which may not do what you want.