二进制游程长度编码

发布于 2024-12-07 09:03:31 字数 1066 浏览 0 评论 0原文

我有一个 Web 表单,我想为其内容生成 Base64 的简短表示。除其他外,该表单包含 264 个二进制值的列表,其中大部分在任何时候都将为 0。 (它们代表地理地图上的区域)。即使在 Base64 中,这个 264 位数字也会生成一个长而令人生畏的字符串。我想尽可能高效地实现游程编码。你能帮我解决这个问题吗?我用谷歌搜索了二进制 RLE,但没有发现任何有用的东西。

到目前为止我已经尝试过 - 使用十进制计数和“A”作为分隔符(表示 0 和 1 之间的变化)在二进制字符串上运行 RLE,然后将结果从基数 11 转换为基数 64。例如:

00000000001111111000000010000000000000000000000001111111110001111010101000000000000000000000000000000000000111111111110111000000000000111111100000001000000000000000000000000111111111000111101010100000000000000000000000000000000000011111111111011100

变得

10A5A5AA22A7A1A2AAAAAAA34A9AA1A10A5A5AA22A7A1A2AAAAAAA34A9AA1A

又变成

CNnbr/FxkgbbOw0LNAKgk65P8SdvaTG+t74o

或,在基数62中,

6imo7zq1pqr2mqglTHzXwJRAksm7fvHZHWQK

它更好,但我仍然忍不住怀疑我是否做错了什么 - 使用数字“A”作为分隔符是最好的方法这?

另一个更新:

感谢@comingstorm,我又缩短了压缩字符串。

ILHHASCAASBYwwccDASYgAEgWDI=

正如我在评论中提到的,实际使用案例通常会导致更短的字符串。

I have a web form, for the contents of which I would like to generate a short representation in Base64. The form, among other things, contains a list of 264 binary values, the greater part of which are going to be 0 at any single time. (They represent regions on a geographical map). Even in Base64, this 264-bit number generates a long, intimidating string. I want to implement run-length encoding, as efficiently as possible. Can you help me with this? I've googled binary RLE, but have found nothing of use.

What I've tried this far - running RLE on the binary string using decimal counts and "A" as a separator denoting a change between 0 and 1, then converting the result from base 11 to base 64. For example:

00000000001111111000000010000000000000000000000001111111110001111010101000000000000000000000000000000000000111111111110111000000000000111111100000001000000000000000000000000111111111000111101010100000000000000000000000000000000000011111111111011100

becomes

10A5A5AA22A7A1A2AAAAAAA34A9AA1A10A5A5AA22A7A1A2AAAAAAA34A9AA1A

which in turn becomes

CNnbr/FxkgbbOw0LNAKgk65P8SdvaTG+t74o

or, in base 62,

6imo7zq1pqr2mqglTHzXwJRAksm7fvHZHWQK

It's better, but I still can't help but doubt if I'm doing something wrong - is using the digit "A" as a separator is the best way to do this?

And another update:

Thanks to @comingstorm, I have shortened the compressed string some more.

ILHHASCAASBYwwccDASYgAEgWDI=

As I mentioned it in the comments, real usage cases would generally result in an even shorter string.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

大海や 2024-12-14 09:03:31

由于您正在对位进行编码,因此您可能希望使用基于位的 RLE 而不是基于字节的 RLE。在这种情况下,您应该考虑使用Elias gamma编码(或其某些变体)来有效地对您的跑步进行编码长度。

您的编码格式的合理的第一个近似值可能是:

  • 第一位 = 与未压缩字符串的第一位相同(以设置初始极性)
  • 剩余位:连续位运行的 Elias 编码长度(交替 1 和 0)

因为您知道有多少位位位于未压缩的字符串中,您不需要终止代码;您可以添加任何必要的二进制填充作为任意位。

请注意,游程长度“压缩”始终可以扩展您的位串;如果您担心这一点,您可以添加另一个初始位来指示您的数据是压缩格式还是未压缩格式,从而将压缩开销限制为 1 位。

Since you're coding bits, you probably want to use a bit-based RLE instead of a byte-based one. In this context, you should consider Elias gamma coding (or some variant thereof) to efficiently encode your run lengths.

A reasonable first approximation for your encoding format might be:

  • first bit = same as the first bit of the uncompressed string (to set initial polarity)
  • remaining bits: Elias coded lengths of successive bit runs (alternating 1 and 0)

Since you know how many bits are in your uncompressed string, you don't need a termination code; you can just add any necessary binary padding as arbitrary bits.

Note that it is always possible for the run-length "compression" to expand your bit string; if you're concerned about this, you can add another initial bit to indicate whether your data is in compressed or uncompressed format, limiting your compression overhead to 1 bit.

美人如玉 2024-12-14 09:03:31

264 位,只有 33 字节,而 Base64 则只有 44 字节。我认为这个(非常小的)信息量很难压缩。稀疏表示 nulvinge 也只存储非零元素及其值(因为您只有 0/1),即在您的情况下仅存储非零位的索引。但是,由于您有 264 个可能的位 - 您需要 9 位用于索引,这意味着,如果您有超过 29 个非零条目,则您需要的条目已经多于原始条目。

也许你的问题表述错误,但我不明白264位如何导致令人生畏的base64字符串(你如何生成它 - 也许你翻译的不是264位,而是264个ASCII字符(值为0 和 1) - 这可以解释你的长结果字符串?)。

264 bit, that are just 33 byte, and that are in base64 just 44 byte. I think this (very small) amount of information is hardly compressable. The sparse representation nulvinge refers too just stores the non zero elements and their values (as you have just 0/1), i.e. in your case just the index of the non zero bits. But as you have 264 possible bits - you need 9 bits for the index, which means, in case you have more than 29 non zero entries, you need already more than original.

Maybe your question is formulated wrong, but I dont see how 264 bits can lead to an intimidating base64 string (How do you generate it - maybe you translate not the 264 bits, but 264 ASCIIs chars (with the value 0 and 1) - that would explain your long result string?).

懵少女 2024-12-14 09:03:31

我认为您更想要的另一种选择是稀疏矩阵:
http://en.wikipedia.org/wiki/Sparse_matrix

An alternative that I think is more what you want is a sparse matrix:
http://en.wikipedia.org/wiki/Sparse_matrix

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文