我没有得到哥伦布/赖斯编码：它确实使输入有更多位，是吗？

发布于 2024-07-16 19:51:25 字数 661 浏览 3 评论 0原文

或者，也许我不明白的是一元编码：

在Golomb 或 Rice 编码，通过将一个数字 N 除以另一个数字，将其分成两部分number M ，然后将该除法的整数结果编码为一元，余数编码为二进制。

在维基百科示例中，他们使用 42 作为 N 和 10作为M，所以我们最终得到商q为4（一元：1110）和余数r为2（二进制010） )，因此生成的消息为 1110,010，即 8 位（可以跳过逗号）。 42 的简单二进制表示是 101010，即 6 位。

对我来说，这似乎是由于 q 的一元表示形式总是必须比二进制更多。

显然，我在这里遗漏了一些重要的观点。它是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

仙女山的月亮 2024-07-23 19:51:25

重要的一点是，哥伦布码并不意味着比某个特定数字的最短二进制编码更短。相反，通过提供特定类型的可变长度编码，它们减少了与固定宽度编码相比，每个编码值的平均长度，如果编码值来自很大的范围，但最常见的值通常很小（因此大多数情况下仅使用该范围的一小部分）

举个例子，如果您要传输 0 到 1000 范围内的整数，但大多数实际值在 0 到 10 之间，采用固定宽度编码，大多数传输的代码都会有前导 0，不包含任何信息：

要覆盖 0 到 1000 之间的所有值，您需要在固定宽度二进制中进行 10 位宽的编码。现在，由于大多数值都低于 10，因此大多数数字的前 6 位至少为 0，并且携带的信息很少。

要使用哥伦布代码纠正此问题，可以通过将数字除以 10 并分别对商和余数进行编码来拆分数字。对于大多数值，必须传输的只是最多可以使用 4 位进行编码的余数（如果对余数使用截断的二进制，则可能会更少）。然后，商以一进制形式传输，对于 10 以下的所有值，编码为单个 0 位，对于 10..19、110 编码为 10 > 对于 20..29 等。

现在，对于大多数值，您已将消息大小减少到最大 5 位，但您仍然能够在没有分隔符的情况下明确传输所有值。

对于较大的值（例如，990..999 范围内的值需要 100 位的商），这会带来相当高的成本，这就是为什么编码对于 2 边几何分布来说是最佳的。

较大值的商中的长 1 位游程可以通过后续游程长度编码来解决。但是，如果商在结果消息中占用太多空间，则可能表明其他代码可能比 Golomb/Rice 更合适。

The important point is that Golomb codes are not meant to be shorter than the shortest binary encoding for one particular number. Rather, by providing a specific kind of variable-length encoding, they reduce the average length per encoded value compared to fixed-width encoding, if the encoded values are from a large range, but the most common values are generally small (and hence are using only a small fraction of that range most of the time).

As an example, if you were to transmit integers in the range from 0 to 1000, but a large majority of the actual values were in the range between 0 and 10, in a fixed-width encoding, most of the transmitted codes would have leading 0s that contain no information:

To cover all values between 0 and 1000, you need a 10-bit wide encoding in fixed-width binary. Now, as most of your values would be below 10, at least the first 6 bits of most numbers would be 0 and would carry little information.

To rectify this with Golomb codes, you split the numbers by dividing them by 10 and encoding the quotient and the remainder separately. For most values, all that would have to be transmitted is the remainder which can be encoded using 4 bits at most (if you use truncated binary for the remainder it can be less). The quotient is then transmitted in unary, which encodes as a single 0 bit for all values below 10, as 10 for 10..19, 110 for 20..29 etc.

Now, for most of your values, you have reduced the message size to 5 bits max, but you are still able to transmit all values unambigously without separators.

This comes at a rather high cost for the larger values (for example, values in the range 990..999 need 100 bits for the quotient), which is why the coding is optimal for 2-sided geometric distributions.

The long runs of 1 bits in the quotients of larger values can be addressed with subsequent run-length encoding. However, if the quotients consume too much space in the resulting message, this could indicate that other codes might be more appropriate than Golomb/Rice.

回复收藏 0 原文