保存哈夫曼代码时出现问题?
我想将霍夫曼代码保存到文件中。我该怎么做? 我将霍夫曼代码保存到字符串中,但生成的文件的大小比原始文件大。
I want to save Huffman codes into a file. How can I do this?
I am saving Huffman codes into a string but size of generated file is bigger than Original file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一种非常简单的方法是一次写一位,如下所示:
读回一位,过程是对称的,
当然这只是最简单的方法,但我会等到你担心代码速度之前再担心首先能够正确保存和加载您的编码数据。
示例:
假设您有以下霍夫曼编码符号
并且您想要对序列进行编码
,那么您将按顺序调用
因此写入的实际字节将是
(请注意,显示的代码从最低有效位开始,即您应该读取该位如果您想识别符号,请从右到左输入括号内的序列)。
A very simple approach is to write one bit at a time with something like the following:
to read back a sigle bit the procedure is symmetrical
of course this is just the simplest approach, but I'd wait before worrying about code speed until you are first able to save and load correctly your encoded data.
Example:
Suppose you have the following Huffman coded symbols
and that you want to encode the sequence
then you would call in order
The actual bytes written would therefore be
(Note that the code shown starts from the least significant bit, i.e. you should read the bit sequences inside the parenthesis from right to left if you want to recognize the symbols).
我相信你保存的是一串 1 和 0。真正的霍夫曼代码需要以二进制形式保存,然后再进行解析。如果您只是将输出保存为字符串,那么您就违背了哈夫曼代码的目的,每个 0 和 1 都是 8 位而不是 1。
I believe what you are saving is a string of 1's and 0's. A true huffman code needs to be saved in binary and then parsed later on. If you are merely saving the output as a string you are defeating the purpose of a huffman code, each 0 and 1 is 8bits instead of 1.
您可能正在为每个模式/字母保存整个字节。
假设 e 是最常见的字母。它将有一个位模式 0。
假设 z 是最不常见的字母,它将有一些以 1 开头的模式。让我们将其指定为 1111 111。
您要写入的文件将是这样的:
0111 1111
您是可能会这样写:
0000 0000 0111 1111。
您需要利用按位运算来执行此操作。
You are probably saving the entire byte for each pattern/letter.
Let's say e is the most common letter. It will have a bit pattern of 0.
Let's say z is the least common letter it will have some pattern starting with 1. Let's just assign it to be 1111 111.
The file you want to be writing will be this:
0111 1111
You are PROBABLY writing this:
0000 0000 0111 1111.
You need to take advantage of bitwise operations to do this.