穷人哈夫曼压缩
我试图更好地理解霍夫曼解码器的工作原理。我有一个代码表,但由于二进制字符串的歧义,我很难理解解码器的工作原理。
(我在为大学最后一年做准备时学习了这一点)
我的表格:
Data Hcode
0, 0
1, 1
2, 10
3, 11
17, 100
18, 101
19, 110
29, 111
如果我有一个像 010011 这样的霍夫曼代码字符串,我可以返回许多不同的数据组合,那么我如何区分?
我理解 BST 表示中的霍夫曼逻辑,并且您遵循一条到达给定叶子的路径,该路径类似于 (0-255(ascii)) 之间给定值的代码,但我仍然不知道如何区分返回数据:0 ,1,0 或 data: 0,17
我真的必须对数据 0 和 1 强制执行 2 位代码吗? (00 和 01)
我希望我能尽我所能解释 XD
如果您想知道我是如何生成该表的 - 您会杀了我,因为我没有使用树逻辑来生成它。虽然我按频率对数据(随机字节)进行排序,但我通过将元素位置编号转换为二进制来生成 Hcode(这就是为什么我将这篇文章称为“Poor Mans Huffman”)。
非常感谢您的任何建议。
Im trying to better understand how the huffman decoder works. Ive got a code table but im having a hard time trying to understand how the decoder would work because of ambiguity in the binary string.
(im learning this in prep for my final year at uni)
my table:
Data Hcode
0, 0
1, 1
2, 10
3, 11
17, 100
18, 101
19, 110
29, 111
If i have a huffman code string like 010011 i can return many different combinations of data so how can i discriminate?
i understand the huffman logic in BST representation and you follow a path to a given leaf which the path resembles the code for that given value between (0-255(ascii)) but i still dont know how you can discriminate between returning data: 0,1,0 or data: 0,17
do i really have to enforce 2 bit codes on data 0 and 1? (00 and 01)
i hope ive explained the best i can XD
If your wondering how I generated the table - your gonna kill me because i didnt use tree logic to generate it. Althought i sorted the data (random bytes) on frequency - i generated the Hcodes by converting the element position number into binary (hency why i called this post Poor Mans Huffman).
Many Thanks for any advice.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
代码表错误。霍夫曼颂歌应该是无前缀。为了随后毫无歧义地对它们进行解码,这是必要的。
如果您使用二叉树来创建代码,这将自动确保“前缀自由”。请参阅:http://en.wikipedia.org/wiki/Huffman_coding
现在,我要去杀了你...;)
The code table is wrong. Huffman odes are supposed to be prefix free. This is neccessary in order to decode them afterwards without ambiguities.
If you would use a binary tree for creating the codes, this would automatically ensure the "prefix freeness". See: http://en.wikipedia.org/wiki/Huffman_coding
And now, I am going to kill you ... ;)
不仅代码表错误,代码的长度也错误。如果你有两个一位代码,那么你已经用完了所有的代码空间,并且不能再有其他代码了。您所显示的不仅不是霍夫曼代码,也不是前缀代码——实际上它根本不是代码。
Not only is the code table wrong, the lengths of the codes are also wrong. If you have two one-bit codes, you have already used up all of the code space, and can have no other codes. What you have shown is not only not a Huffman code and not a prefix code -- it is in fact not a code at all.