将霍夫曼树与编码比特流一起存储为文件的基本技术是什么?
如何将霍夫曼编码位流存储为二进制文件?
How can I store a Huffman Encoded bit stream as a binary file?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
如何将霍夫曼编码位流存储为二进制文件?
How can I store a Huffman Encoded bit stream as a binary file?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(1)
要存储编码,您需要具备三件事:
解决这些问题的方法有很多种。您可以将每个字符的位模式显式存储在表中,或者可以对压缩的所有流使用相同的编码表。至于如何检测流的结束,一种选择可能是使用伪 EOF 字符来终止该流。为此,在构建霍夫曼编码树时,向其中添加一个具有多重性的新字符,该字符将充当描绘流末尾的哨兵。在写入结果时,您可以在末尾写入此字符,以便您可以知道流在哪里结束,即使它没有使用完全适合字节的位数。
为了存储实际内容,我建议将编码表示缓冲到位向量中,该位向量会自动刷新到八位倍数的文件流中。当然,这不是唯一的方法,因此请选择最有效的方法。
To store the encoding, you'll need to have three things:
There are many ways to solve each of these problems. You could explicitly store the bit patterns for each character in a table, or could just use the same encoding table for all streams you compress. As for how to detect the end of the stream, one option might be to use a pseudo-EOF character to terminate this stream. To do this, as you build up the Huffman encoding tree, add to it a new character with multiplicity one that will serve as a sentinel delineating the end of the stream. On writing the result, you write this character at the end so that you can tell where the stream ends even if it doesn't use a number of bits that fits perfectly into a byte.
To store the actual contents I'd suggest buffering the encoded representation into a bit vector that automatically flushes into a file stream on a multiple of eight bits. Of course, this isn't the only way to do this, so go with whatever works best.