霍夫曼将两个字符编码为一个
我需要哈夫曼代码(最好在Python或Java中),它可以不是用一个字符(a = 10, b = 11)
对文本进行编码,而是用两个(ab = 11, ag = 10)
。是否可能,如果可以,我在哪里可以找到它,也许它在互联网上的某个地方,我只能找到它?
I need huffman code(best in python or in java), which could encode text not by one character (a = 10, b = 11)
, but by two (ab = 11, ag = 10)
. Is it possible and if yes, where could i find it, maybe it's somewhere in the internet and i just can'd find it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
霍夫曼代码不关心字符,它关心符号。通常,它用于对字母表/其他单个字符进行编码,但可以很容易地推广到对字符串进行编码。基本上,您只需采用现有的实现并允许符号是字符串而不是字符。叶节点将对应于字符串列表。
Huffman code doesn't care about characters, it cares about symbols. Generally, it is used to encode the alphabet / other single characters, but can very easily be generalized to encode strings of characters. Basically, you would just take an existing implementation and allow symbols to be strings rather than characters. A leaf node would then correspond to a list of strings.
有一个随 Python bitarray 模块分发的霍夫曼编码器示例,如果有的话给你。
There's a Huffman encoder example distributed with the Python bitarray module, if that's any use to you.
某处可能有一些代码。但这听起来像是一个解析和标记化问题。我要回答的第一个问题是您正在处理多少个独特的对。霍夫曼编码最适合少量标记。例如,键盘上的 101 个字符。但如果你的两个角色可以是任何东西,那么你现在正在大规模扩展角色的最大数量。
There is probably some code somewhere. But this sounds like a parsing and tokenising question. One of the first questions I would be answering is how many unique pairs are you dealing with. Huffman encoding works best with small numbers of tokens. For example, the 101 characters on your keyboard. But if your two characters can be anything, you are now expanding the maximum number of characters massively.