二进制游程长度编码
我有一个 Web 表单,我想为其内容生成 Base64 的简短表示。除其他外,该表单包含 264 个二进制值的列表,其中大部分在任何时候都将为 0。 (它们代表地理地图上的区域)。即使在 Base64 中,这个 264 位数字也会生成一个长而令人生畏的字符串。我想尽可能高效地实现游程编码。你能帮我解决这个问题吗?我用谷歌搜索了二进制 RLE,但没有发现任何有用的东西。
到目前为止我已经尝试过 - 使用十进制计数和“A”作为分隔符(表示 0 和 1 之间的变化)在二进制字符串上运行 RLE,然后将结果从基数 11 转换为基数 64。例如:
00000000001111111000000010000000000000000000000001111111110001111010101000000000000000000000000000000000000111111111110111000000000000111111100000001000000000000000000000000111111111000111101010100000000000000000000000000000000000011111111111011100
变得
10A5A5AA22A7A1A2AAAAAAA34A9AA1A10A5A5AA22A7A1A2AAAAAAA34A9AA1A
又变成
CNnbr/FxkgbbOw0LNAKgk65P8SdvaTG+t74o
或,在基数62中,
6imo7zq1pqr2mqglTHzXwJRAksm7fvHZHWQK
它更好,但我仍然忍不住怀疑我是否做错了什么 - 使用数字“A”作为分隔符是最好的方法这?
另一个更新:
感谢@comingstorm,我又缩短了压缩字符串。
ILHHASCAASBYwwccDASYgAEgWDI=
正如我在评论中提到的,实际使用案例通常会导致更短的字符串。
I have a web form, for the contents of which I would like to generate a short representation in Base64. The form, among other things, contains a list of 264 binary values, the greater part of which are going to be 0 at any single time. (They represent regions on a geographical map). Even in Base64, this 264-bit number generates a long, intimidating string. I want to implement run-length encoding, as efficiently as possible. Can you help me with this? I've googled binary RLE, but have found nothing of use.
What I've tried this far - running RLE on the binary string using decimal counts and "A" as a separator denoting a change between 0 and 1, then converting the result from base 11 to base 64. For example:
00000000001111111000000010000000000000000000000001111111110001111010101000000000000000000000000000000000000111111111110111000000000000111111100000001000000000000000000000000111111111000111101010100000000000000000000000000000000000011111111111011100
becomes
10A5A5AA22A7A1A2AAAAAAA34A9AA1A10A5A5AA22A7A1A2AAAAAAA34A9AA1A
which in turn becomes
CNnbr/FxkgbbOw0LNAKgk65P8SdvaTG+t74o
or, in base 62,
6imo7zq1pqr2mqglTHzXwJRAksm7fvHZHWQK
It's better, but I still can't help but doubt if I'm doing something wrong - is using the digit "A" as a separator is the best way to do this?
And another update:
Thanks to @comingstorm, I have shortened the compressed string some more.
ILHHASCAASBYwwccDASYgAEgWDI=
As I mentioned it in the comments, real usage cases would generally result in an even shorter string.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
由于您正在对位进行编码,因此您可能希望使用基于位的 RLE 而不是基于字节的 RLE。在这种情况下,您应该考虑使用Elias gamma编码(或其某些变体)来有效地对您的跑步进行编码长度。
您的编码格式的合理的第一个近似值可能是:
因为您知道有多少位位位于未压缩的字符串中,您不需要终止代码;您可以添加任何必要的二进制填充作为任意位。
请注意,游程长度“压缩”始终可以扩展您的位串;如果您担心这一点,您可以添加另一个初始位来指示您的数据是压缩格式还是未压缩格式,从而将压缩开销限制为 1 位。
Since you're coding bits, you probably want to use a bit-based RLE instead of a byte-based one. In this context, you should consider Elias gamma coding (or some variant thereof) to efficiently encode your run lengths.
A reasonable first approximation for your encoding format might be:
Since you know how many bits are in your uncompressed string, you don't need a termination code; you can just add any necessary binary padding as arbitrary bits.
Note that it is always possible for the run-length "compression" to expand your bit string; if you're concerned about this, you can add another initial bit to indicate whether your data is in compressed or uncompressed format, limiting your compression overhead to 1 bit.
264 位,只有 33 字节,而 Base64 则只有 44 字节。我认为这个(非常小的)信息量很难压缩。稀疏表示 nulvinge 也只存储非零元素及其值(因为您只有 0/1),即在您的情况下仅存储非零位的索引。但是,由于您有 264 个可能的位 - 您需要 9 位用于索引,这意味着,如果您有超过 29 个非零条目,则您需要的条目已经多于原始条目。
也许你的问题表述错误,但我不明白264位如何导致令人生畏的base64字符串(你如何生成它 - 也许你翻译的不是264位,而是264个ASCII字符(值为
0 和
1
) - 这可以解释你的长结果字符串?)。264 bit, that are just 33 byte, and that are in base64 just 44 byte. I think this (very small) amount of information is hardly compressable. The sparse representation nulvinge refers too just stores the non zero elements and their values (as you have just 0/1), i.e. in your case just the index of the non zero bits. But as you have 264 possible bits - you need 9 bits for the index, which means, in case you have more than 29 non zero entries, you need already more than original.
Maybe your question is formulated wrong, but I dont see how 264 bits can lead to an intimidating base64 string (How do you generate it - maybe you translate not the 264 bits, but 264 ASCIIs chars (with the value
0
and1
) - that would explain your long result string?).我认为您更想要的另一种选择是稀疏矩阵:
http://en.wikipedia.org/wiki/Sparse_matrix
An alternative that I think is more what you want is a sparse matrix:
http://en.wikipedia.org/wiki/Sparse_matrix