为什么我可以“折叠”将一系列整数缩小为一半大小而不丢失信息？

发布于 2024-08-04 04:27:41 字数 1426 浏览 4 评论 0原文

我试图理解一篇关于浮动无损压缩的论文点数字并陷入一个特定步骤，即作者将某个范围的有符号整数映射到一半大小的范围，丢失了我认为需要的信息。我有一种感觉，作者正在使用一些标准技术，这些技术对他们的观众来说是如此明显，以至于他们懒得解释，但对我来说却完全不透明。

被“折叠”的值是两个 23 位正整数（预测浮点值和实际浮点值的尾数）之间的差值，介于 1 - 2²³ 和 2^{23 之间} - 1. 作者将具有最高值（负数和正数）的数字“向内”移动，因此结果范围是大小的一半，并且每个数字（0 除外）映射到原始范围中的两个可能值。这让我想知道应该如何反转该过程来确定原始值。用作者自己的话说：

我们计算最短模 2²³ 的有符号校正器和指定最紧间隔的数字 k (1-2^k , 2^k) 该校正器属于其中。接下来，这个范围在 0 到 22 之间的数字k被压缩[...]。最后，校正器的 k + 1 个有效位被压缩。

其伪代码如下：

void comp mantissa(int expo, int a, int p) {
  // c will be within [1-2^23 ... 2^23 -1]
  int c = a - p;
  // wrap c into [1-2^22 ... 2^22 ]
  if (c <= -(1<<22)) c += 1<<23;
  else if (c > (1<<22)) c -= 1<<23;
  // find tightest [1-2^k ... 2^k ] containing c
  int k = 0;
  // loop could be replaced with faster code
  int c1 = (c < 0 ? -c : c);
  while (c1) { c1 = c1 >> 1; k++ }
  // adjust k for case that c is exactly 2k
  if (k && (c == 1<<(k-1))) k--;

  // .. further code omitted for brevity
}

忽略实际的压缩方法，输出由 c 和 k 组成。我不明白的是：当“将 c 包装到”时，如何从 c 和 k 恢复原始的 c上面的部分只是将潜在范围的一半映射到另一半？我在纸上用 4 位而不是 23 位尝试了这一点，但我就是不明白。

原文

I'm trying to understand a paper on lossless compression of floating point numbers and get stuck on one particular step where the authors map a signed integer from a certain range into a range of half the size, losing information that I would think is required. I have a feeling that the authors are using some standard technique which is so obvious to their audience that they don't bother to explain, but which is completely opaque to me.

The value that's being "folded" is the difference between two 23-bit positive integers (the mantissas of a predicted and an actual floating point value) which lies between 1 - 2²³ and 2²³ - 1. The authors move the numbers with the highest values (negative and positive) "inwards", so the resulting range is half the size and each number (except 0) maps to two possible values from the original range. This makes me wonder how the process should ever be reversed to determine the original value. In the authors` own words:

We compute the signed corrector that is the shortest modulo 2²³ and the number k that specifies the tightest interval (1-2^k , 2^k) into which this corrector falls. Next, this number k, which ranges between 0 to 22 is compressed [...]. Finally the k + 1 significant bits of the corrector are compressed.

The pseudocode for this is given as:

void comp mantissa(int expo, int a, int p) {
  // c will be within [1-2^23 ... 2^23 -1]
  int c = a - p;
  // wrap c into [1-2^22 ... 2^22 ]
  if (c <= -(1<<22)) c += 1<<23;
  else if (c > (1<<22)) c -= 1<<23;
  // find tightest [1-2^k ... 2^k ] containing c
  int k = 0;
  // loop could be replaced with faster code
  int c1 = (c < 0 ? -c : c);
  while (c1) { c1 = c1 >> 1; k++ }
  // adjust k for case that c is exactly 2k
  if (k && (c == 1<<(k-1))) k--;

  // .. further code omitted for brevity
}

Ignoring the actual compression method, the output consists of c and k. What I don't get is: How can I restore the original c from c and k when the "wrap c into" part above just maps half of the potential range onto the other half? I tried this on paper with 4 instead of 23 bits and I just don't get it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

勿忘初心 2024-08-11 04:27:41

当作者说他们正在考虑有效数“模 2^23”时，这意味着数字将存储在 23 位整数中，因此相差 2^23 倍数的数字将“相同”，因为位模式是相同。 http://mathworld.wolfram.com/ModularArithmetic.html）

（参见 " c=ap 之后的代码仅对 c 加或减 2^23，当您稍后通过计算 a = c+p 来反转它时，您会得到正确的值，因为 2^23 并不重要。

这是二进制的示例...

a =             00000000000000000000001
p =             10000000000000000000100
c = a-p =      -10000000000000000000011

然后，由于 c<=-(1<<22)，因此发生换行...

c = c+(1<<23) = 11111111111111111111101

然后对其进行编码。然后，您可以从c和p返回a：

a = c+p =      100000000000000000000001

但由于它存储在23位整数中，因此这相当于：

a =             00000000000000000000001

这是原始的a。

When the author says they are considering the significands "modulo 2^23", that means the numbers will be stored in 23-bit integers, so numbers that differ by multiples of 2^23 will be "the same" since the bit pattern is the same. (See http://mathworld.wolfram.com/ModularArithmetic.html)

Since the "wrapping" code after c=a-p only adds or subtracts 2^23 to c, when you later reverse this by computing a = c+p you get the right value as the 2^23 doesn't matter.

Here's an example in binary...

a =             00000000000000000000001
p =             10000000000000000000100
c = a-p =      -10000000000000000000011

then, since c<=-(1<<22), the wrapping occurs...

c = c+(1<<23) = 11111111111111111111101

Which is then encoded. Then later, you can get back a from c and p:

a = c+p =      100000000000000000000001

but since this is stored in a 23-bit integer, this is equivalent to:

a =             00000000000000000000001

which is the original a.

回复收藏 0 原文

~没有更多了~

关于作者

誰認得朕

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

为什么我可以“折叠”将一系列整数缩小为一半大小而不丢失信息？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

为什么我可以“折叠”将一系列整数缩小为一半大小而不丢失信息？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。