8 位序列上的霍夫曼编码证明
数据文件包含 8 位字符序列,因此所有 256 个字符大致相同:最大字符频率小于最小字符频率的两倍。证明这种情况下的霍夫曼编码并不比使用普通的 8 位定长编码更有效。
A data file contains a sequence of 8-bit characters such that all 256 characters are about as common: the maximum character frequency is less than twice the minimum character frequency. Prove that Huffman coding in this case is not more efficient than using an ordinary 8-bit fixed-length code.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
证明是直接的。假设 wlog 字符按频率升序排序。我们知道 f(1) 和 f(2) 将首先连接到 f'(1) 中,并且由于 f(2) >= f(1) 且 2*f(1) > f(256),直到 f(256) 与某些东西连接之后才会连接。出于同样的原因,f(3)和f(4)将连接成f'(2),其中f'(2)>=f'(1)>。 f(256)。继续这样,我们得到 f(253) 和 f(254) 连接成 f'(127) >= ... >= f'(1) > f(256)。最后,将f(255)和f(256)连接成f'(128)>=f'(127)>=...>=f'(1)。我们现在认识到,由于 f(256) < 2*f(1) <= f'(1) 且 f'(128) <= 2*f(256)、f'(128) <= 2*f(256) <= 4*f(1) <= 2*f'(1)。因此,f'(128) < 2*f'(1),与霍夫曼算法第一轮的条件相同。
由于该条件在本轮中成立,因此很容易认为它在所有轮次中都同样成立。霍夫曼将执行 8 轮,直到所有节点都连接到一个,即根 (128, 64, 32, 16, 8, 4, 2, 1),此时算法将终止。由于在每个阶段,每个节点都会连接到另一个节点,而该节点已接受霍夫曼算法的相同处理,因此树的每个分支将具有相同的长度:8。
这有点非正式,更多的是一个草图,而不是确实是一个证明,但它应该足够你写一些更正式的东西了。
The proof is direct. Assume w.l.o.g. that the characters are sorted in ascending order of frequency. We know that f(1) and f(2) will be joined first into f'(1), and since f(2) >= f(1) and 2*f(1) > f(256), this won't be joined until after f(256) is joined with something. By the same token, f(3) and f(4) will be joined into f'(2) with f'(2) >= f'(1) > f(256). Continuing thusly, we get f(253) and f(254) joined into f'(127) >= ... >= f'(1) > f(256). Finally, f(255) and f(256) are joined into f'(128) >= f'(127) >= ... >= f'(1). We now recognize that since f(256) < 2*f(1) <= f'(1) and f'(128) <= 2*f(256), f'(128) <= 2*f(256) < 4*f(1) <= 2*f'(1). Ergo, f'(128) < 2*f'(1), the same condition that held for the first round of the Huffman algorithm.
Since the condition holds on this round, it is straightforward to argue that it will similarly hold on all rounds. Huffman will perform 8 rounds until all nodes are joined to one, the root (128, 64, 32, 16, 8, 4, 2, 1), at which point the algorithm will terminate. Since at each stage each node is joined to another one which has, to that point, received the same treatment by the Huffman algorithm, each branch of the tree will have the same length: 8.
This is somewhat informal, more of a sketch than a proof, really, but it should be more than enough for you to write something more formal.