二进制流的熵编码
我想压缩二进制流。 我知道在每个“1”之后找到“0”的概率更高,而在每个“0”之后找到“1”的概率也更高。 我应该如何编码它? 我正在考虑莱斯代码,但到目前为止我还没有想到......提前感谢您的回复。
I want to compress a binary stream. I know that after each '1' there is an higher probability of finding a '0', and after each '0' there is an higher probability of finding a '1'. How should I encode it? I was thinking about Rice codes, but I didn't get so far... Thanks in advance for any reply.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您尝试过一些简单的霍夫曼编码吗? 也许它不会节省那么多,但如果代码“10”和“01”之一的概率比“00”或“11”高得多,您可以将其重新映射为“0”,将其他代码重新映射为“10” 、“110”和“111”。
当然,这不是最好的选择,因为它将您的流分成 2 位块并且只优化一种情况。 但是,可以通过计算/测量更大输入集(如 4 或 8 位)的概率来改进它,在 8 位情况下,fe 10101010 和 01010101 将比 00000000 和 11111111 更常用。
通过算术编码,您可能会得到更好的结果或者一些真正使用基于位概率的模型的压缩。
另一种简单的方法是反转每一秒的位。 由于您提到的概率会倾向于许多交替的流部分,例如 0101010,这将为您提供许多流部分,例如 111111,通常可以通过常用的压缩算法更好地压缩它们。 但这种方法的成功取决于“概率差距”到底有多大。
Have you tried some simple huffman coding? Perhaps it won't save that much, but if one of the codes '10' and '01' has much higher probabilities than '00' or '11', you can remap it to '0' and the others to '10', '110' and '111'.
Of course, this won't be the best choice as it splits your stream into 2 bit chunks and only optimizes one case. However, it can be refined by calculating/measuring probabilities for a bigger input set like 4 or 8 bits, f.e. in the 8 bits case 10101010 and 01010101 will be used more often than 00000000 and 11111111.
You might get even better results with arithmetic coding or some compression that really uses some model based on the bit probalitities.
Another simple approach would be to invert every second bit. As the probability you mention will tend to many alternating stream parts like 0101010, this will give you many stream parts like 111111 which can usually be compressed better by usual compression algorithms. But the success of this method depends on how big the "probability gap" really is.