扩展霍夫曼码
我有这个作业:找到任何给定字母表中符号的代码字。它说我必须对三个符号组使用二进制霍夫曼。这到底是什么意思?我是否在 [alphabet]^3 上使用常规霍夫曼?如果是这样,我如何区分一组中的 3 个符号之间的区别?
I have this homework: finding the code words for the symbols in any given alphabet. It says I have to use binary Huffman on groups of three symbols. What does that mean exactly? Do i use regular Huffman on [alphabet]^3? If so, how do I then tell the difference between the 3 symbols in a group?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不太清楚,因为你对问题的描述并不那么详细,但我猜它们的意思是,你不应该单独编码字母表中的每个符号,而应该将每个三重符号作为一个组来处理。
因此,例如,如果您的字母表由
a
、b
和c
组成,则无需为每个字母单独生成编码,而是将为aaa
、aab
、aac
等创建编码。这些字符串中的每一个都将被视为霍夫曼算法中的单独符号;您可以简单地通过对它们进行字符串比较来区分它们。如果您需要接受任意长度的输入,则还需要在字母符号中包含长度为 1 或 2 的字符串。例如,如果您要对字符串aabacab
进行编码,您将需要将其分解为符号aab
、aca
和b
。这有助于回答您的问题吗?我不太确定您在寻找什么,所以如果这还没有解决任何问题,请随时编辑您的问题或在评论中回复。
I can't quite tell, because your description of the problem isn't all that detailed, but I would guess that they mean that instead of encoding each symbol in your alphabet individually, you are supposed to tread each triple of symbols as a group.
So, for instance, if your alphabet consists of
a
,b
, andc
, instead of generating an encoding for each of those individually, you would create an encoding foraaa
,aab
,aac
, etc. Each one of these strings would be treated as a separate symbol in the Huffman algorithm; you can tell them apart simply by doing string comparison on them. If you need to accept input of arbitrary length, you will also need to include in your alphabet symbols that are strings of length 1 or 2. For instance, if you're encoding the stringaabacab
, you would need to break that down into the symbolsaab
,aca
, andb
.Does that help answer your question? I wasn't quite sure what you're looking for, so please feel free to edit your question or reply in a comment if this hasn't cleared anything up.
思考:更短的字符串和“块边界”的排列怎么样?那么 1 和 2 个字符串呢?您是否只是在输入文本中数出 3、6、9、12、... 字符,然后在末尾填充任何不均匀的长度?
如果块的大小可以是可变的,那么找到最合适的块就会变得非常有趣。我怀疑它会退化为旅行推销员的问题,但也许有一个简洁的“定理”或其他工具可以解决这类问题。
也许尝试 3 个字符的所有排列,保存最常用的,然后尝试为 1 和 2 个字符的长间隙找到一个合适的组合?嗯,听起来可能真的很慢,但是可以使用某种递归分而治之的方法:拉出块长度为 N 的长字符串,然后递归地将间隙编码为长度 N - 1。
问题多于答案,我害怕。
Food for thought: what about shorter strings, and permutations of "block boundaries"? What about 1 and 2 character strings? Do you just count off 3, 6, 9, 12, ... chars into your input text and then null pad any uneven lengths at the end?
If the chunks can be of variable size, then it gets really interesting to find the best fit. I suspect it degenerates into a traveling salesman kind of problem, but maybe there's a neat "theorem" or other tool out there for this kind of thing.
Perhaps try all permutations of 3 chars, saving the most frequently used, then try to come up with a good fit for the 1 and 2 char long gaps? Hmm, sounds like it might be really slow, but doable using some kind of recursive divide and counquer approach: pull out the long string of block length N, then recurse into encoding the gaps as length N - 1.
More questions than answers, I'm afraid.