保存哈夫曼代码时出现问题?

发布于 2024-11-19 08:30:26 字数 60 浏览 2 评论 0原文

我想将霍夫曼代码保存到文件中。我该怎么做? 我将霍夫曼代码保存到字符串中,但生成的文件的大小比原始文件大。

I want to save Huffman codes into a file. How can I do this?
I am saving Huffman codes into a string but size of generated file is bigger than Original file.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

陪你搞怪i 2024-11-26 08:30:26

一种非常简单的方法是一次写一位,如下所示:

unsigned char acc; // Accumulator of bit waiting to be written
int bitcount;      // How many bits are aready present in the accumulator

// write a single bit (0/1)
void writebit(int bit)
{
    acc |= (bit << bitcount);
    if (++bitcount == 8)
    {
        writebyte(acc);
        acc = 0;
        bitcount = 0;
    }
}

读回一位,过程是对称的,

unsigned char acc;   // bits waiting to be extracted
int bitcount;        // how many bits are still available in acc

int readbit()
{
   if (bitcount == 0)
   {
       bitcount = 8;
       acc = readbyte();
   }
   --bitcount;
   return (acc >> (7 - bitcount)) & 1;
}

当然这只是最简单的方法,但我会等到你担心代码速度之前再担心首先能够正确保存和加载您的编码数据。

示例:

假设您有以下霍夫曼编码符号

A - 0
B - 10
C - 110
D - 111

并且您想要对序列进行编码

A B A A C D A D B B

,那么您将按顺序调用

writebit(0);                           // A
writebit(1); writebit(0);              // B
writebit(0);                           // A
writebit(0);                           // A
writebit(1); writebit(1); writebit(0); // C
writebit(1); writebit(1); writebit(1); // D
writebit(0);                           // A
writebit(1); writebit(0);              // B
writebit(1); writebit(0);              // B

因此写入的实际字节将是

(01100010) = 0x62
(01010111) = 0x57

(请注意,显示的代码从最低有效位开始,即您应该读取该位如果您想识别符号,请从右到左输入括号内的序列)。

A very simple approach is to write one bit at a time with something like the following:

unsigned char acc; // Accumulator of bit waiting to be written
int bitcount;      // How many bits are aready present in the accumulator

// write a single bit (0/1)
void writebit(int bit)
{
    acc |= (bit << bitcount);
    if (++bitcount == 8)
    {
        writebyte(acc);
        acc = 0;
        bitcount = 0;
    }
}

to read back a sigle bit the procedure is symmetrical

unsigned char acc;   // bits waiting to be extracted
int bitcount;        // how many bits are still available in acc

int readbit()
{
   if (bitcount == 0)
   {
       bitcount = 8;
       acc = readbyte();
   }
   --bitcount;
   return (acc >> (7 - bitcount)) & 1;
}

of course this is just the simplest approach, but I'd wait before worrying about code speed until you are first able to save and load correctly your encoded data.

Example:

Suppose you have the following Huffman coded symbols

A - 0
B - 10
C - 110
D - 111

and that you want to encode the sequence

A B A A C D A D B B

then you would call in order

writebit(0);                           // A
writebit(1); writebit(0);              // B
writebit(0);                           // A
writebit(0);                           // A
writebit(1); writebit(1); writebit(0); // C
writebit(1); writebit(1); writebit(1); // D
writebit(0);                           // A
writebit(1); writebit(0);              // B
writebit(1); writebit(0);              // B

The actual bytes written would therefore be

(01100010) = 0x62
(01010111) = 0x57

(Note that the code shown starts from the least significant bit, i.e. you should read the bit sequences inside the parenthesis from right to left if you want to recognize the symbols).

鼻尖触碰 2024-11-26 08:30:26

我相信你保存的是一串 1 和 0。真正的霍夫曼代码需要以二进制形式保存,然后再进行解析。如果您只是将输出保存为字符串,那么您就违背了哈夫曼代码的目的,每个 0 和 1 都是 8 位而不是 1。

I believe what you are saving is a string of 1's and 0's. A true huffman code needs to be saved in binary and then parsed later on. If you are merely saving the output as a string you are defeating the purpose of a huffman code, each 0 and 1 is 8bits instead of 1.

黎歌 2024-11-26 08:30:26

您可能正在为每个模式/字母保存整个字节。

假设 e 是最常见的字母。它将有一个位模式 0。

假设 z 是最不常见的字母,它将有一些以 1 开头的模式。让我们将其指定为 1111 111。

您要写入的文件将是这样的:

0111 1111

您是可能会这样写:

0000 0000 0111 1111。

您需要利用按位运算来执行此操作。

You are probably saving the entire byte for each pattern/letter.

Let's say e is the most common letter. It will have a bit pattern of 0.

Let's say z is the least common letter it will have some pattern starting with 1. Let's just assign it to be 1111 111.

The file you want to be writing will be this:

0111 1111

You are PROBABLY writing this:

0000 0000 0111 1111.

You need to take advantage of bitwise operations to do this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文