对 0 到 64 之间的 2 个位置进行编码的最有效方法?
我有 64 位值,我想通过利用这样一个事实来压缩它们:中间只有一部分包含数据,并且之前和之后都是零。
假设实际数据长 l 位,前面填充 n 个 0,末尾填充 m 个 0,这样 n + l + m = 64。我可以传输 l 位加上我需要编码的任何内容,而不是传输/存储 64 位数据在 64 位间隔中的位置。
例如,假设我正在存储 l、m 和数据位,那么我将通过读取 l、读取 l 位数据、读取 m 并将数据 m 位向左移动来恢复原始 64 位模式。
我能想到的最小开销是两倍 6 位,用于存储 l、n 和 m 中的任意两个(每个可以在 0 到 64 之间)。有可能减少这个数字吗?
I have 64 bit values that I want to compress by exploiting the fact that only a portion somewhere in the middle contains data and before and after that are zeroes.
Say the actual data is l bits long and padded with n 0s in front and m 0s at the end such that n + l + m = 64. Instead of transmitting / storing 64 bits, I can transmit l bits plus whatever I need to encode the position of the data in the 64-bit interval.
For example, say I was storing l, m and the data bits, then I would restore the original 64-bit pattern by reading l, reading l bits of data, reading m and shifting the data m bits to the left.
The smallest overhead I could come up with is two times 6 bits for storing either two of l, n and m (each can be between 0 and 64). Is it possible to reduce that number?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您的分析对于单个值来说听起来很正确。但是,如果您要一起传输大量此类值,则像 gzip 这样的通用熵编码算法可能会做得更好,因为它可以很好地消除零字符串,并且还可以利用数据中的冗余。
Your analysis sounds right for single vlaues. But if you're transmitting lots of such values together, a generic entropy encoding algorithm like gzip will probably do better, since it can eliminate the strings of zeroes quite well and also exploit redundancies in the data.
正如您所陈述的问题,不,您不能做得比您提出的解决方案更好。
但是,如果数字中零的分布是倾斜的,则通过使用霍夫曼代码或类似的技术来表示计数,您可以平均获得更好的压缩。另一种可能性是,如果零分布从一个 64 位值到下一个值强相关,则使用增量编码。
无论哪种情况,您都需要使用可变数量的位数来表示零的数量。如果您对偏斜或相关性的假设被证明是错误的,那么您最终可能会比以简单方式完成的情况平均使用更多的位。
As you have stated the problem, no you cannot do better that the solution you have proposed.
However, if the distribution of the zeros in the numbers is skewed, you may be able to get better compression on average by using Huffman codes or a similar technique to represent the counts. Another possibility is to use delta coding if the zero distribution is strongly correlated from one 64bit value to the next.
In either case, you will need to use a variable number of bits to represent the numbers of zeros. And if your assumptions about skewedness or correlation turn out to be false, you may end up using more bits on average than if you had done it the simple way.
l 可以是从 0 到 64,所以不要发送 l,而是发送 n 和 m,因为它们都可以为零,并且不需要达到 64(它们只需要能够添加到 64)。
l 位必须以 1 开头和结尾,因此不需要传输。
发送 6 位表示 n
m 最多发送 6 位(见下文)
计算 l = 64 - (n + m)
如果l = 0,则数字为0,不发送任何其他内容
如果 l = 1,则数字为 1 * 2^m,不要发送任何其他内容
如果 l = 2,则数字为 3 * 2^m,不要发送任何其他内容
发送中间的 l - 2 位。
最大开销 = 10 位。
m 位数的减少是因为
如果n> 32 那么你知道 m < 32,所以只需要5位
如果n> 48 那么你知道m < 16,所以只需要4位
如果n> 56 那么你知道m < 8,所以只需要3位
如果n> 60 那么你知道 m < 4,所以只需要2位
如果 n = 63 那么你知道 m < 2、所以只需要1位
l can be from 0 to 64, so don't send l, send n and m, since they can both be zero, and don't need to go up to 64 (they simply need to be able to add to 64).
The l bits must start and end with a 1, so they do not need to be transmitted.
send 6 bits for n
send up to 6 bits for m (see below)
calculate l = 64 - (n + m)
if l = 0, the number is 0, don't send anything else
if l = 1, the number is 1 * 2^m, don't send anything else
if l = 2, the number is 3 * 2^m, don't send anything else
send the middle l - 2 bits.
Maximum overhead = 10 bits.
The reduction in the bits for m is because
if n > 32 then you know m < 32, so only needs 5 bits
if n > 48 then you know m < 16, so only needs 4 bits
if n > 56 then you know m < 8, so only needs 3 bits
if n > 60 then you know m < 4, so only needs 2 bits
if n = 63 then you know m < 2, so only needs 1 bit
您的解决方案看起来相当不错。
霍夫曼编码是另一种压缩值的方法,尤其是在值出现频率很高的情况下。
实现它并不是很困难,但如果您没有太多数据要传输,则可能会很困难。
Your solution seems pretty good.
Huffman coding is another way to compress your values especially if there are values with great frequency.
It's not very difficult to implement it, but it might be overwhelming if you don't have much data to transmit.
1 序列有
64
个可能的起始位置n
,并且序列l
的长度不能再大于64 - n
。所以总共有一个序列。添加的 1 用于全零序列。做一些数学计算得出以下结果。
表示 2081 个可能的值需要 log2(2081) = 11.023 位。因此,您建议使用总共需要
12
位的两个6
位数字对信息进行编码是最佳的(假设所有可能值均等分布)。There are
64
possible start positionsn
of the sequence of ones and the length of the sequencel
can be no longer then64 - n
. So there asequences in total. The added one is for a sequence of all zeros. Doing some math yields the following.
Representing 2081 possible values requires
log2(2081) = 11.023
bits. Your suggestion to encode the information using two6
bit numbers requiring12
bits in total is hence optimal (under the assumption of equal distributions of all possible values).