GZIPInputStream 和字符集

发布于 2024-11-29 09:31:20 字数 2386 浏览 0 评论 0原文

我有一个包含拉丁文、西里尔文和中文字符的文本。我尝试使用 GZIPOutputStream 压缩字符串（通过 bytes[]），并使用 GZIPInputStream 解压缩。但我无法将所有角色转换回原始角色。有些显示为 ?。

我认为 UTF-16 可以完成这项工作。

有什么帮助吗？

问候

这是我的代码：

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;
import java.util.zip.DataFormatException;
import java.util.zip.Deflater;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
import java.util.zip.Inflater;
import java.util.zip.ZipException;

public class CompressUncompressStrings {

    public static void main(String[] args) throws UnsupportedEncodingException {

        String sTestString="äöüäöü 长安";
        System.out.println(sTestString);
        byte bcompressed[]=compress(sTestString.getBytes("UTF-16"));
        //byte bcompressed[]=compress(sTestString.getBytes());
        String sDecompressed=decompress(bcompressed);
        System.out.println(sDecompressed);
    }
    public static byte[] compress(byte[] content){
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        try{
            GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream);
            gzipOutputStream.write(content);
            gzipOutputStream.close();
        } catch(IOException e){
            throw new RuntimeException(e);
        }
        return byteArrayOutputStream.toByteArray();
    }
    public static String decompress(byte[] contentBytes){

        String sReturn="";
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        try{
            GZIPInputStream gzipInputStream =new GZIPInputStream(new ByteArrayInputStream(contentBytes));
             ByteArrayOutputStream baos = new ByteArrayOutputStream();
             for (int value = 0; value != -1;) {
                 value = gzipInputStream.read();
                 if (value != -1) {
                     baos.write(value);
                 }
             }
             gzipInputStream.close();
             baos.close();
             sReturn=new String(baos.toByteArray(), "UTF-16");
             return sReturn;
                 // Ende Neu

        } catch(IOException e){
            throw new RuntimeException(e);
        }
    }
}

原文

I have a Text with Latin, Cyrillic and Chinese Characters containing.
I try to compress a String (over bytes[]) with GZIPOutputStream and decompress it with GZIPInputStream. But I do not manage to convert all Characters back to the original Characters. Some appear as ?.

I thought that UTF-16 will do the job.

Any help?

Regards

Here's my code:

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;
import java.util.zip.DataFormatException;
import java.util.zip.Deflater;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
import java.util.zip.Inflater;
import java.util.zip.ZipException;

public class CompressUncompressStrings {

    public static void main(String[] args) throws UnsupportedEncodingException {

        String sTestString="äöüäöü 长安";
        System.out.println(sTestString);
        byte bcompressed[]=compress(sTestString.getBytes("UTF-16"));
        //byte bcompressed[]=compress(sTestString.getBytes());
        String sDecompressed=decompress(bcompressed);
        System.out.println(sDecompressed);
    }
    public static byte[] compress(byte[] content){
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        try{
            GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream);
            gzipOutputStream.write(content);
            gzipOutputStream.close();
        } catch(IOException e){
            throw new RuntimeException(e);
        }
        return byteArrayOutputStream.toByteArray();
    }
    public static String decompress(byte[] contentBytes){

        String sReturn="";
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        try{
            GZIPInputStream gzipInputStream =new GZIPInputStream(new ByteArrayInputStream(contentBytes));
             ByteArrayOutputStream baos = new ByteArrayOutputStream();
             for (int value = 0; value != -1;) {
                 value = gzipInputStream.read();
                 if (value != -1) {
                     baos.write(value);
                 }
             }
             gzipInputStream.close();
             baos.close();
             sReturn=new String(baos.toByteArray(), "UTF-16");
             return sReturn;
                 // Ende Neu

        } catch(IOException e){
            throw new RuntimeException(e);
        }
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情域 2024-12-06 09:31:20

我怀疑只是控制台有问题。我尝试了上面的代码，虽然它没有正确打印出任何字符，但当我测试字符串的往返时，它很好：

System.out.println(sDecompressed.equals(sTestString)); // Prints true

这在你的机器上有什么作用？

I suspect it's just the console that's having a problem. I tried the above code, and although it didn't print out any of the characters properly, when I tested the round-tripping of the string, it was fine:

System.out.println(sDecompressed.equals(sTestString)); // Prints true

What does that do on your machine?

回复收藏 0 原文

〆一缕阳光ご 2024-12-06 09:31:20

在控制台输出上显示非 ASCII 字符并不容易。假设您使用 Windows 作为操作系统（因为默认情况下命令行不支持 Unicode），您可以更改活动代码页号（使用 chcp 命令）。我不知道它是如何通过代码完成的，但我建议在命令行上运行代码。

此 chcp 值 65001 发生更改，以告诉 Windows 在其控制台上使用 UTF-8（您可以查看讨论此处）。

我希望这有帮助。

回复收藏 0 原文

~没有更多了~

关于作者

人生百味

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

GZIPInputStream 和字符集

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

小瓶盖

wxsp_Ukbq8xGR

1638627670

仅一夜美梦

夜访吸血鬼

近卫軍团

友情链接

GZIPInputStream 和字符集

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

小瓶盖

wxsp_Ukbq8xGR

1638627670

仅一夜美梦

夜访吸血鬼

近卫軍团

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。