Java 中的霍夫曼编码期间无法压缩文件

发布于 2024-12-10 14:23:50 字数 267 浏览 0 评论 0原文

我使用优先级队列在 Java 中实现了霍夫曼编码算法,其中我从根到叶遍历树,并根据符号在输入中出现的次数获得编码示例 #=000011。一切都很好,树构建得很好,编码正如预期的那样:但是我得到的输出文件比原始文件更大。我目前正在附加“0”和“0” '1' 到遍历树的左节点和右节点的字符串。可能我最终得到的每个字符都使用所有 8 位,这对压缩没有帮助。我猜测需要将这些位转换为字符值。这样这些字符使用的位数少于 8 个,因此我得到了原始文件的压缩版本。您能否告诉我如何通过在 Java 中操作字符和减少位来实现压缩?谢谢

I have implemented the Huffman Encoding Algorithm in Java using Priority Queues where I traverse the Tree from Root to Leaf and get encoding example as #=000011 based on the number of times the symbol appears in the input. Everything is fine, the tree is being built fine, encoding is just as expected: But the output file I am getting is bigger size than the original file. I am currently appending '0' & '1' to a String on traversing left node and right node of the tree. Probably what I end up with uses all 8 bits for each characters and it does not help in compression. I am guessing there is some conversion of these bits into character values which is required. So that these characters use fewer bits than 8 and hence I get a compressed version of the original file. Could you please let me know how to achieve a compression by manipulating characters and reducing bits in Java? Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

宫墨修音 2024-12-17 14:23:50

您可能正在使用 StringBuilder 并附加“0”或“1”,或者只是使用 + 运算符将“0”或“1”连接到字符串末尾。或者您正在使用某种 OutputStream 并写入它。

您要做的就是写入实际的位。我建议在写入之前先制作一个完整字节。一个字节如下所示:

0x05

表示二进制字符串 0000 0011

您可以通过创建 byte 类型、添加和移位来实现这些:

public void writeToFile(String binaryString, OutputStream os){
    int pos = 0;
    while(pos < binaryString.length()){
        byte nextByte = 0x00;
        for(int i=0;i<8 && pos+i < binaryString.length(); i++){
            nextByte << 1;
            nextByte += binaryString.charAt(pos+i)=='0'?0x0:0x1;
        }
        os.write(nextByte);
        pos+=8;
    }
}

当然,一次写入一个字节的效率很低,而且最重要的是,OutputStream 接口只接受字节数组 (字节[])。因此,您最好将字节存储在数组中(或者更简单,列表),然后以更大的块写入它们。

如果不允许使用字节写入(为什么不可以?ObjectOutputStream 支持写入字节数组!),那么您可以使用 Base64 来编码二进制字符串。但请记住,Base64 会使您的数据使用量增加 33%。

将字节数组转换为 Base64 的一种简单方法是使用现有的编码器。添加以下导入后:

import sun.misc.BASE64Encoder;

您可以实例化编码器并将字节数组转换为字符串:

byte[] bytes = getBytesFromHuffmanEncoding();
BASE64Encoder encoder = new BASE64Encoder();
String encodedString = encoder.encode(bytes);

You're probably using a StringBuilder and appending "0" or "1", or simply the + operator to concatenate "0" or "1" to the end of your string. Or you're using some sort of OutputStream and writing to it.

What you want to do is to write the actual bits. I'd suggest making a whole byte first before writing. A byte looks like this:

0x05

Which would represent the binary string 0000 0011.

You can make these by making a byte type, adding and shifting:

public void writeToFile(String binaryString, OutputStream os){
    int pos = 0;
    while(pos < binaryString.length()){
        byte nextByte = 0x00;
        for(int i=0;i<8 && pos+i < binaryString.length(); i++){
            nextByte << 1;
            nextByte += binaryString.charAt(pos+i)=='0'?0x0:0x1;
        }
        os.write(nextByte);
        pos+=8;
    }
}

Of course, it's inefficient to write one byte at a time, and on top of that the OutputStream interface only accepts byte arrays (byte[]). So you'd be better off storing the bytes in an array (or even easier, a List), then writing them at bigger chunks.

If you are not allowed to use byte writes (why the heck not? ObjectOutputStream supports writing byte arrays!), then you can use Base64 to encode your binary string. But remember that Base64 inflates your data usage by 33%.

An easy way to convert a byte array to base64 is by using an existing encoder. After adding the following import:

import sun.misc.BASE64Encoder;

You can instantiate the encoder and turn your byte array into a string:

byte[] bytes = getBytesFromHuffmanEncoding();
BASE64Encoder encoder = new BASE64Encoder();
String encodedString = encoder.encode(bytes);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文