在 Java 中将二进制数据编码为 ASCII
我有一个二进制数据位集,我希望将其紧凑地编码为 ASCII 字符串。我打算最初使用游程编码来压缩数据以给出整数序列;例如
111110001000000000000111
变成:(
5o3z1o12z3o
例如5个1,3个0,1个1,12个0,3个1)。
但是,我希望将其进一步压缩为紧凑的 ASCII 字符串(即使用完整范围的 ASCII 字符而不是数字加上“o”和“z”的字符串)。任何人都可以推荐合适的方法和/或第 3 方库来在 Java 中执行此操作吗?
I have a bitset of binary data that I wish to encode compactly as an ASCII string. I intend to initially compress the data using run-length encoding to give a sequence of integers; e.g.
111110001000000000000111
becomes:
5o3z1o12z3o
(e.g. 5 ones, 3 zeros, 1 one, 12 zeros, 3 ones).
However, I wish to then compress this further into a compact ASCII string (i.e. a string using the full range of ASCII characters rather than the digits plus 'o' and 'z'). Can anyone recommend a suitable approach and / or 3rd party library to do this in Java?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您的目标是压缩,只需对流进行 gzip 压缩即可。它会比游程编码做得更好。
然后,如果您出于某种原因需要它是文本,例如安全地通过旧的邮件网关,我也会转向像 Base64 这样的标准编码,而不是自己编写。
但如果你想自己动手:首先我要指出的是,你不需要“o”和“z”。您已经知道这些值,因为它们是交替的。假设它从 0 开始(如果不是,则编码一个初始 0 以表明有 0 个 0)。
对数字进行文本编码是可能的,但可能效率低下。研究整数值的可变长度编码,然后对这些字节进行编码。然后以某种方式将它们“转义”为 ASCII。
但随后我们又回到了类似 Base64 的编码,第一个建议是 gzip + Base64 可能比所有这些都更容易。
If your goal is compression, just gzip the stream. It's going to do better than your run-length encoding.
Then if you need it to be text for some reason, like to safely pass through old mail gateways, I'd also turn to a standard encoding like Base64, rather than make up your own.
But if you want to roll your own: first I'd note that you don't need the 'o' and 'z'. You already know those values since they alternate. Assume it starts on 0 (and if it doesn't, encode an initial 0 to show that there are 0 0s).
Encoding the numbers textually is possible but probably inefficient. Look into a variable-length encoding for integer values, then encode those bytes. Then 'escape' them into ASCII somehow.
But then we're back to Base64-like encoding, and the first suggestion to gzip + Base64 is probably easier than all of this.