使用java进行Unicode base 64编码

发布于 2024-11-02 04:39:26 字数 2369 浏览 0 评论 0原文

我正在尝试将 UTF8 字符串编码和解码为 Base64。理论上这不是问题，但在解码时似乎永远不会输出正确的字符，但会输出？。


        String original = "خهعسيبنتا";
        B64encoder benco = new B64encoder();
        String enc = benco.encode(original);
        try
        {
            String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
            PrintStream out = new PrintStream(System.out, true, "UTF-8");
            out.println("Original: " + original);
            prtHx("ara", original.getBytes());
            out.println("Encoded: " + enc);
            prtHx("enc", enc.getBytes());
            out.println("Decoded: " + dec);
            prtHx("dec", dec.getBytes());
        } catch (UnsupportedEncodingException e)
        {
            e.printStackTrace();
        }

控制台的输出如下：

Original: هมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมมร ara = 3F，3F，3F，3F，3F，3F，3F，3F，3F
编码：Pz8/Pz8/Pz8/
enc = 50、7A、38、2F、50、7A、38、2F、50、7A、38、2F
解码：?????????
dec = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F

prtHx 只是将字节的十六进制值写入输出。我在这里做明显错误的事情吗？

Andreas 通过强调 getBytes() 方法使用平台默认编码 (Cp1252) 指出了正确的解决方案，即使源文件本身是 UTF-8。通过使用 getBytes("UTF-8") 我能够注意到编码和解码的字节实际上是不同的。进一步调查发现encode方法使用了getBytes()。改变这一点很好地达到了目的。


try
        {
            String enc = benco.encode(original);
            String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
            PrintStream out = new PrintStream(System.out, true, "UTF-8");
            out.println("Original: " + original);
            prtHx("ori", original.getBytes("UTF-8"));
            out.println("Encoded: " + enc);
            prtHx("enc", enc.getBytes("UTF-8"));
            out.println("Decoded: " + dec);
            prtHx("dec", dec.getBytes("UTF-8"));

        } catch (UnsupportedEncodingException e)
        {
            e.printStackTrace();
        }

系统编码Cp1252
原文：Юهseябния
ori = D8、AE、D9、87、D8、B9、D8、B3、D9、8A、D8、A8、D9、86、D8、AA、D8、A7
编码：2K7Zh9i52LPZitio2YbYqtin
enc = 32、4B、37、5A、68、39、69、35、32、4C、50、5A、69、74、69、6F、32、59、62、59、71、74、69 , 6E
解码： Ùsée 巴尼塔
dec = D8、AE、D9、87、D8、B9、D8、B3、D9、8A、D8、A8、D9、86、D8、AA、D8、A7

谢谢。

原文

I am trying to encode and decode a UTF8 string to base64.
In theory not a problem but when decoding and never seem to output the correct characters but the ?.


        String original = "خهعسيبنتا";
        B64encoder benco = new B64encoder();
        String enc = benco.encode(original);
        try
        {
            String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
            PrintStream out = new PrintStream(System.out, true, "UTF-8");
            out.println("Original: " + original);
            prtHx("ara", original.getBytes());
            out.println("Encoded: " + enc);
            prtHx("enc", enc.getBytes());
            out.println("Decoded: " + dec);
            prtHx("dec", dec.getBytes());
        } catch (UnsupportedEncodingException e)
        {
            e.printStackTrace();
        }

The output to the console is as follow:

Original: خهعسيبنتا
ara = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F
Encoded: Pz8/Pz8/Pz8/
enc = 50, 7A, 38, 2F, 50, 7A, 38, 2F, 50, 7A, 38, 2F
Decoded: ?????????
dec = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F

prtHx simply writes the hex value of the bytes to the output.
Am I doing something obviously wrong here?

Andreas pointed to the correct solution by highlighting that the getBytes() method uses the platform default encoding (Cp1252) even though the source file itself is UTF-8. By using the getBytes("UTF-8") I was able to notice that the bytes encoded and decoded were actually different.
further investigation shown that the encode method used getBytes(). Changing this did the trick nicely.


try
        {
            String enc = benco.encode(original);
            String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
            PrintStream out = new PrintStream(System.out, true, "UTF-8");
            out.println("Original: " + original);
            prtHx("ori", original.getBytes("UTF-8"));
            out.println("Encoded: " + enc);
            prtHx("enc", enc.getBytes("UTF-8"));
            out.println("Decoded: " + dec);
            prtHx("dec", dec.getBytes("UTF-8"));

        } catch (UnsupportedEncodingException e)
        {
            e.printStackTrace();
        }

System encoding Cp1252
Original: خهعسيبنتا
ori = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7
Encoded: 2K7Zh9i52LPZitio2YbYqtin
enc = 32, 4B, 37, 5A, 68, 39, 69, 35, 32, 4C, 50, 5A, 69, 74, 69, 6F, 32, 59, 62, 59, 71, 74, 69, 6E
Decoded: خهعسيبنتا
dec = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

-黛色若梦 2024-11-09 04:39:26

String#getBytes() 使用平台的默认字符集对字符进行编码。字符串文字 "?????" 的实际编码是在 java 源文件中“定义”的（您在创建或保存文件时选择字符编码）

这可能就是原因，为什么 ara 编码为 0x3f 字节。

尝试一下：

out.println("Original: " + original);
prtHx("ara", original.getBytes("UTF-8"));
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes("UTF-8"));
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes("UTF-8"));

String#getBytes() encodes the characters using the platform's default charset. The actual encoding of the String literal "خهعسيبنتا" is "defined" in the java source file (you choose a character encoding when you create or save the file)

This could be the reason, why ara is encode to 0x3f bytes..

Give this a try:

out.println("Original: " + original);
prtHx("ara", original.getBytes("UTF-8"));
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes("UTF-8"));
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes("UTF-8"));

回复收藏 0 原文

~没有更多了~