为什么我可以“折叠”将一系列整数缩小为一半大小而不丢失信息?
我试图理解 一篇关于浮动无损压缩的论文点数字并陷入一个特定步骤,即作者将某个范围的有符号整数映射到一半大小的范围,丢失了我认为需要的信息。我有一种感觉,作者正在使用一些标准技术,这些技术对他们的观众来说是如此明显,以至于他们懒得解释,但对我来说却完全不透明。
被“折叠”的值是两个 23 位正整数(预测浮点值和实际浮点值的尾数)之间的差值,介于 1 - 223 和 223 之间 - 1. 作者将具有最高值(负数和正数)的数字“向内”移动,因此结果范围是大小的一半,并且每个数字(0 除外)映射到原始范围中的两个可能值。这让我想知道应该如何反转该过程来确定原始值。用作者自己的话说:
我们计算最短模 223 的有符号校正器和指定最紧间隔的数字
k
(1-2k , 2k) 该校正器属于其中。接下来,这个范围在 0 到 22 之间的数字k
被压缩[...]。最后,校正器的k + 1
个有效位被压缩。
其伪代码如下:
void comp mantissa(int expo, int a, int p) {
// c will be within [1-2^23 ... 2^23 -1]
int c = a - p;
// wrap c into [1-2^22 ... 2^22 ]
if (c <= -(1<<22)) c += 1<<23;
else if (c > (1<<22)) c -= 1<<23;
// find tightest [1-2^k ... 2^k ] containing c
int k = 0;
// loop could be replaced with faster code
int c1 = (c < 0 ? -c : c);
while (c1) { c1 = c1 >> 1; k++ }
// adjust k for case that c is exactly 2k
if (k && (c == 1<<(k-1))) k--;
// .. further code omitted for brevity
}
忽略实际的压缩方法,输出由 c
和 k
组成。我不明白的是:当“将 c 包装到”时,如何从 c
和 k
恢复原始的 c
上面的部分只是将潜在范围的一半映射到另一半?我在纸上用 4 位而不是 23 位尝试了这一点,但我就是不明白。
I'm trying to understand a paper on lossless compression of floating point numbers and get stuck on one particular step where the authors map a signed integer from a certain range into a range of half the size, losing information that I would think is required. I have a feeling that the authors are using some standard technique which is so obvious to their audience that they don't bother to explain, but which is completely opaque to me.
The value that's being "folded" is the difference between two 23-bit positive integers (the mantissas of a predicted and an actual floating point value) which lies between 1 - 223 and 223 - 1. The authors move the numbers with the highest values (negative and positive) "inwards", so the resulting range is half the size and each number (except 0) maps to two possible values from the original range. This makes me wonder how the process should ever be reversed to determine the original value. In the authors` own words:
We compute the signed corrector that is the shortest modulo 223 and the number
k
that specifies the tightest interval (1-2k , 2k) into which this corrector falls. Next, this numberk
, which ranges between 0 to 22 is compressed [...]. Finally thek + 1
significant bits of the corrector are compressed.
The pseudocode for this is given as:
void comp mantissa(int expo, int a, int p) {
// c will be within [1-2^23 ... 2^23 -1]
int c = a - p;
// wrap c into [1-2^22 ... 2^22 ]
if (c <= -(1<<22)) c += 1<<23;
else if (c > (1<<22)) c -= 1<<23;
// find tightest [1-2^k ... 2^k ] containing c
int k = 0;
// loop could be replaced with faster code
int c1 = (c < 0 ? -c : c);
while (c1) { c1 = c1 >> 1; k++ }
// adjust k for case that c is exactly 2k
if (k && (c == 1<<(k-1))) k--;
// .. further code omitted for brevity
}
Ignoring the actual compression method, the output consists of c
and k
. What I don't get is: How can I restore the original c
from c
and k
when the "wrap c into" part above just maps half of the potential range onto the other half? I tried this on paper with 4 instead of 23 bits and I just don't get it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当作者说他们正在考虑有效数“模 2^23”时,这意味着数字将存储在 23 位整数中,因此相差 2^23 倍数的数字将“相同”,因为位模式是相同。 http://mathworld.wolfram.com/ModularArithmetic.html)
(参见 " c=ap 之后的代码仅对 c 加或减 2^23,当您稍后通过计算 a = c+p 来反转它时,您会得到正确的值,因为 2^23 并不重要。
这是二进制的示例...
然后,由于 c<=-(1<<22),因此发生换行...
然后对其进行编码。然后,您可以从c和p返回a:
但由于它存储在23位整数中,因此这相当于:
这是原始的a。
When the author says they are considering the significands "modulo 2^23", that means the numbers will be stored in 23-bit integers, so numbers that differ by multiples of 2^23 will be "the same" since the bit pattern is the same. (See http://mathworld.wolfram.com/ModularArithmetic.html)
Since the "wrapping" code after c=a-p only adds or subtracts 2^23 to c, when you later reverse this by computing a = c+p you get the right value as the 2^23 doesn't matter.
Here's an example in binary...
then, since c<=-(1<<22), the wrapping occurs...
Which is then encoded. Then later, you can get back a from c and p:
but since this is stored in a 23-bit integer, this is equivalent to:
which is the original a.