长度受限哈夫曼码的包合并算法

发布于 2024-11-03 07:11:22 字数 653 浏览 5 评论 0原文

下面的解释来自维基百科关于使用包合并的长度限制霍夫曼代码。我不明白，我对此有一些疑问。

我们如何包装？
我们如何合并？
我们如何识别符号位串的长度？

设L为任何码字允许的最大长度。令 p₁, …, p_n 为符号的频率要编码的字母表。我们首先对符号进行排序，使得 p_i ≤ p_i+1。为每个符号创建 L 个硬币，面额为 2⁻¹, …, 2^−L，每种都是钱币值p_i。使用包合并算法选择具有最小钱币价值的硬币集合，其面额总计为 n − 1。令 h_i 为数字已选择钱币价值 p_i 的硬币。最佳长度受限的霍夫曼编码将使用长度为 h_i 的位串对符号 i 进行编码。"< /p>

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

温柔嚣张 2024-11-10 07:11:22

也许这只是构建霍夫曼代码的另一种方法。您是否看过http://cbloomrants .blogspot.com/2010/07/07-02-10-length-limited-huffman-codes.html？ IMO 包合并算法并没有构建霍夫曼树。您想要寻找哥伦布密码。

回复收藏 0 原文

夏九 2024-11-10 07:11:22

是的，这只是构建对码字长度有限制的霍夫曼代码的一种方法。

霍夫曼码将字母表中的每个字母编码为可以唯一确定的二进制字符串。例如，如果您的字母表是 {A, B, C}，并且 A 比 B 和 C 更常见，则以下编码可以很好地工作：

A - 0
B - 10
C - 11

诸如 0010110 的编码字符串可以被唯一编码，因为编码位串的长度已经由哈夫曼代码定义了（这里——每个以 0 开头的字符串的长度为 1，每个以 1 开头的字符串的长度为 2）。因此字符串解码为 0|0|10|11|0 = AABCA。

现在构造霍夫曼码的“问题”是如何选择编码位串，使得结果编码平均尽可能短。在您的问题中，有一个额外的限制，即任何代码字的长度不能超过L。总体思路是使用较短的字符串来编码更常见的符号。

包合并算法的细节并不重要，因为关键是您使用算法来选择“面额总计 n - 1 的具有最小钱币价值的硬币组”。如果您有面额为 2⁻¹, 2⁻², ... 的硬币，并且您想用它们构建总价值 100 美分，您可以考虑这个过程首先有一枚价值 100 的硬币，然后将其分成两个 50 美分 (2⁻¹)，然后根据需要继续将硬币分成两半，例如50 美分 + 25 美分 + 12.5 美分 + 12.5 美分。这对应于二叉树的构造；每当你分裂一枚硬币时，你都会在二叉树中创建一个内部节点，并在更深的一层上添加两个叶子。

现在最小化钱币价值的想法是，那些链接到“更高频率”符号的“硬币”使用起来更昂贵，因此您希望更少地分割这些硬币，对应于具有更短的代码。

细节留给读者作为练习。

Yes, it's just a way to build a Huffman code that has limit on codeword length.

A Huffman code encodes every letter of the alphabet with a binary string that can be uniquely determined. For example, if your alphabet is {A, B, C} and A is more common than B and C, the following encoding can work well:

A - 0
B - 10
C - 11

An encoded string such as 0010110 can be uniquely encoded, because the length of the encoding bit string is defined already by the Huffman code (here --- every string that begins with 0 has length 1, and every string that beings with 1 has length 2). So the string decodes to 0|0|10|11|0 = AABCA.

Now the "problem" in constructing Huffman codes is how to select the encoding bit strings so that the resulting encodings are on the average as short as possible. In your problem there is an additional constraint that the length of any code word cannot exceed L. The general idea is to use shorter strings to encode more common symbols.

The details of the package-merge algorithm aren't important, as the key is that you use an algorithm to select "set of coins of minimum numismatic value whose denominations total n - 1". If you have coins with denominations 2⁻¹, 2⁻², ..., and you want to build total value of 100 cents out of them, you can think of this process as having first a coin of value 100, and then splitting it to two 50 cents (2⁻¹), and then continuing to split your coins in half as long as you want, e.g. 50 cents + 25 cents + 12.5 cents + 12.5 cents. This corresponds to the construction of a binary tree; whenever you split a coin, you create an internal node in a binary tree and add two leaves on one level deeper.

Now the idea of minimizing the numismatic value is that those "coins" that are linked to "higher frequency" symbols are more expensive to use, so you want to split those coins less, corresponding to having shorter codes.

The details are left as an exercise to the reader.

回复收藏 0 原文

~没有更多了~