使用java进行Unicode base 64编码
我正在尝试将 UTF8 字符串编码和解码为 Base64。 理论上这不是问题,但在解码时似乎永远不会输出正确的字符,但会输出?。
String original = "خهعسيبنتا";
B64encoder benco = new B64encoder();
String enc = benco.encode(original);
try
{
String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Original: " + original);
prtHx("ara", original.getBytes());
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes());
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes());
} catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
控制台的输出如下:
Original: هมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมร ara = 3F,3F,3F,3F,3F,3F,3F,3F,3F
编码:Pz8/Pz8/Pz8/
enc = 50、7A、38、2F、50、7A、38、2F、50、7A、38、2F
解码:?????????
dec = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F
prtHx 只是将字节的十六进制值写入输出。 我在这里做明显错误的事情吗?
Andreas 通过强调 getBytes() 方法使用平台默认编码 (Cp1252) 指出了正确的解决方案,即使源文件本身是 UTF-8。通过使用 getBytes("UTF-8") 我能够注意到编码和解码的字节实际上是不同的。 进一步调查发现encode方法使用了getBytes()。改变这一点很好地达到了目的。
try
{
String enc = benco.encode(original);
String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Original: " + original);
prtHx("ori", original.getBytes("UTF-8"));
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes("UTF-8"));
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes("UTF-8"));
} catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
系统编码Cp1252
原文:Юهseябния
ori = D8、AE、D9、87、D8、B9、D8、B3、D9、8A、D8、A8、D9、86、D8、AA、D8、A7
编码:2K7Zh9i52LPZitio2YbYqtin
enc = 32、4B、37、5A、68、39、69、35、32、4C、50、5A、69、74、69、6F、32、59、62、59、71、74、69 , 6E
解码: Ùsée 巴尼塔
dec = D8、AE、D9、87、D8、B9、D8、B3、D9、8A、D8、A8、D9、86、D8、AA、D8、A7
谢谢。
I am trying to encode and decode a UTF8 string to base64.
In theory not a problem but when decoding and never seem to output the correct characters but the ?.
String original = "خهعسيبنتا";
B64encoder benco = new B64encoder();
String enc = benco.encode(original);
try
{
String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Original: " + original);
prtHx("ara", original.getBytes());
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes());
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes());
} catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
The output to the console is as follow:
Original: خهعسيبنتا
ara = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F
Encoded: Pz8/Pz8/Pz8/
enc = 50, 7A, 38, 2F, 50, 7A, 38, 2F, 50, 7A, 38, 2F
Decoded: ?????????
dec = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F
prtHx simply writes the hex value of the bytes to the output.
Am I doing something obviously wrong here?
Andreas pointed to the correct solution by highlighting that the getBytes() method uses the platform default encoding (Cp1252) even though the source file itself is UTF-8. By using the getBytes("UTF-8") I was able to notice that the bytes encoded and decoded were actually different.
further investigation shown that the encode method used getBytes(). Changing this did the trick nicely.
try
{
String enc = benco.encode(original);
String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Original: " + original);
prtHx("ori", original.getBytes("UTF-8"));
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes("UTF-8"));
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes("UTF-8"));
} catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
System encoding Cp1252
Original: خهعسيبنتا
ori = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7
Encoded: 2K7Zh9i52LPZitio2YbYqtin
enc = 32, 4B, 37, 5A, 68, 39, 69, 35, 32, 4C, 50, 5A, 69, 74, 69, 6F, 32, 59, 62, 59, 71, 74, 69, 6E
Decoded: خهعسيبنتا
dec = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
String#getBytes()
使用平台的默认字符集对字符进行编码。字符串文字"?????"
的实际编码是在 java 源文件中“定义”的(您在创建或保存文件时选择字符编码)这可能就是原因,为什么
ara
编码为0x3f
字节。尝试一下:
String#getBytes()
encodes the characters using the platform's default charset. The actual encoding of the String literal"خهعسيبنتا"
is "defined" in the java source file (you choose a character encoding when you create or save the file)This could be the reason, why
ara
is encode to0x3f
bytes..Give this a try: