贪婪地编码的霍夫曼树的时间复杂性

发布于 2025-02-11 18:59:26 字数 1347 浏览 1 评论 0原文

编辑以澄清这个问题

我将在Uni上使用两种众所周知的压缩算法（Huffman Coding和Lempel-Ziv 77）进行实验室项目。我对霍夫曼编码的实施类似于贪婪的方法，在该方法中，树是通过以下步骤构建的：

1. Calculate frequencies for all unique characters and place them in a minimum heap
2. While there are more than two nodes in heap:
   2.1 Take a value from the minimum heap and place it as the left child
   2.2 Take a value from the minimum heap and place it as the right child
   2.3 Create a parent node with a frequency of 'frequency of the left child + frequency 
of the right child'
  2.4 Place the parent node to the minimum heap and continue
3. Finalize the tree with a root node (children are the last two nodes in the minimum heap)

我能够在两种算法的所有其他步骤的时间复杂性上找到良好的来源，除了 霍夫曼编码的减压阶段。我目前的理解是，即使霍夫曼树的实现是不平衡的，在最坏的情况下，长度为n-1（n等于唯一字符的计数），遍历时代也与不同的频率平衡人物。

在不平衡的霍夫曼树中，具有较高频率的叶子节点更为常见，并且具有较短的遍历路径。这可以平衡总时间成本，而我的直觉指出，总时间将接近O（k log n）的时间复杂性，其中k是未压缩内容的长度，而n huffman树中的独特字符。

如果我有可靠的来源可以参考减压阶段的时间复杂性支持我的直觉或反驳它，我会感到更加舒适。如果有人在涵盖这个特定问题的书籍或不罚款的书籍上有很好的提示，我将非常感谢您的帮助！我在项目中付出了很多努力，即使我没有及时找到答案，我也不会认为这个细节会造成很大的凹痕。我主要是对这个主题超级兴奋，并希望尽可能多地学习。

以防万一有人在一个类似的问题上偶然发现这个问题，这是我的项目。请记住，这是一个学生项目，在审查它的同时保持批判性思维是一件好事。

原文

Edited to clarify the question

I'm about to turn in a laboratory project at Uni on two well known compression algorithms (Huffman coding and Lempel-Ziv 77). My implementation of Huffman coding is similar to the greedy approach, in which the tree is built with the following steps:

1. Calculate frequencies for all unique characters and place them in a minimum heap
2. While there are more than two nodes in heap:
   2.1 Take a value from the minimum heap and place it as the left child
   2.2 Take a value from the minimum heap and place it as the right child
   2.3 Create a parent node with a frequency of 'frequency of the left child + frequency 
of the right child'
  2.4 Place the parent node to the minimum heap and continue
3. Finalize the tree with a root node (children are the last two nodes in the minimum heap)

I've been able to find good sources on the time complexities of all other steps for both algorithms, except for the decompression phase of Huffman coding. My current understanding is that even though this implementation of Huffman tree is unbalanced and in the worst case has the longest path of length n-1 (in which n equals the count of unique characters), the traversal times are balanced by the frequencies of different characters.

In an unbalanced Huffman tree the leaf nodes with higher frequencies are more common and have a shorter traversal paths. This balances the total time cost and my intuition states that the total time would approach a time complexity of O(k log n), in which k is the length of the uncompressed content and n the unique characters in the Huffman tree.

I'd feel much more comfortable if I had reliable source to reference regarding the time complexity of the decompression phase that either support my intuition or counters it. If anyone has good tips on books or not-super-difficult-to-read-articles that cover this particular question, I'd very much appreciate the help! I've put quite a lot of effort into my project and I don't think this one detail is going to make a big dent, even if I don't find the answer in time. I'm mainly just super-fascinated by the topic and want to learn as much as possible.

And just in case anyone ever stumbles into this question on a similar question, here's my project. Keep in mind that it's a student project and it's good to keep a critical mind while reviewing it.

分享到QQ

分享到微博