霍夫曼树中的歧义
我想为频率为 5、4、3、2 的四个符号 a、b、c、d 创建一棵霍夫曼树。
为简单起见,我将忽略符号标签本身,仅关注频率标签。
第一步是创建 4 个单顶点树,其根标记为频率 5、4、3、2。
接下来,合并根标记为 2,3 的两个单顶点树,得到根标记为 3+2=5、子树标记为 2,3 的三顶点树。
下一步是将标记为 4 的树与另外两棵树中的一棵合并,两棵树的根都标记为频率 5。
我们选择哪一棵?
这两种选择导致了截然不同的树和霍夫曼代码。具体来说,一个选择导致码字长度为1,2,3,3的代码,而另一种选择导致码字长度2,2,2,2。
在这种特殊情况下,前一种代码不可能是正确的,因为它的预期码字长度更长......这将违反著名的霍夫曼代码的最优性。
构建始终产生最佳霍夫曼代码的树的一般处方是什么?
I want to create a Huffman tree for four symbols a,b,c,d occurring with frequencies 5,4,3,2.
For simplicity, I'll ignore the symbol labels themselves and focus only on the frequency labels.
The first step, is to create 4 single vertex trees whose roots are labelled with the frequencies 5,4,3,2.
Next, merge the two single vertex trees whose roots are labelled 2,3 to get a three vertex tree whose root is labelled 3+2=5 and whose children are labelled 2,3.
The next step is to merge the tree labelled 4 with one of the two other trees, both of whose roots are labeled with frequency 5.
Which one to we choose?
The two choices lead to very different trees and Huffman codes. Specifically, one choice leads to a code with codeword lengths 1,2,3,3 and the other choice leads to codeword lengths 2,2,2,2.
In this particular case, the former code can't possibly be correct because its expected codeword length is longer ... this would violate the famous optimality of Huffman codes.
What is a general prescription for constructing trees that always yields an optimal Huffman code?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
马克·阿德勒对此进行了澄清。郑重声明,
上述两种代码的预期码字长度恰好是 2 位。
对于第一个代码:
$5/14 * 1 + 4/14 * 2 + 3/14 * 3 + 2/14 * 3 = 2$
对于第二个代码:
$5/14 * 2 + 4/14 * 2 + 3/14 * 2 + 2/14 * 2 = 2$
我想无论我以哪种方式打破平局,生成的代码都是最佳的!
谢谢马克!
Mark Adler clarified this. For the record,
The expected codeword length for both codes proposed above are exactly 2 bits.
For the first code:
$5/14 * 1 + 4/14 * 2 + 3/14 * 3 + 2/14 * 3 = 2$
For the second code:
$5/14 * 2 + 4/14 * 2 + 3/14 * 2 + 2/14 * 2 = 2$
I guess that whichever way I break ties, the resulting code is optimal!
Thanks Mark!