理论上可能的最大压缩率是多少?

发布于 2024-09-09 08:35:36 字数 485 浏览 3 评论 0原文

这是一个理论问题,因此这里的许多细节在实践中甚至在理论上都是不可计算的。

假设我有一个要压缩的字符串 s。结果应该是一个输出 s 的自解压二进制文件(可以是 x86 汇编程序,但也可以是其他假设的图灵完备低级语言)。

现在,我们可以轻松地迭代所有可能的此类二进制文件和程序,并按大小排序。令B_s为输出s的这些二进制文件的子列表(当然B_s是不可计算的)。

由于每组正整数都必须有一个最小值,因此 B_s 中必定存在一个最小的程序 b_min_s

对于哪些语言(即字符串集),我们知道 b_min_s 的大小?也许只是一个估计。 (我可以构造一些简单的例子,我什至可以计算 B_sb_min_s,但我对更有趣的语言感兴趣。)

This is a theoretical question, so expect that many details here are not computable in practice or even in theory.

Let's say I have a string s that I want to compress. The result should be a self-extracting binary (can be x86 assembler, but it can also be some other hypothetical Turing-complete low level language) which outputs s.

Now, we can easily iterate through all possible such binaries and programs, ordered by size. Let B_s be the sub-list of these binaries who output s (of course B_s is uncomputable).

As every set of positive integers must have a minimum, there must be a smallest program b_min_s in B_s.

For what languages (i.e. set of strings) do we know something about the size of b_min_s? Maybe only an estimation. (I can construct some trivial examples where I can always even calculate B_s and also b_min_s, but I am interested in more interesting languages.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

鸠书 2024-09-16 08:35:36

这是Kolmogorov 复杂度,你是正确的,它是不可计算。如果是的话,您可以创建一个长度为 n 的矛盾程序,打印一个柯尔莫哥洛夫复杂度为 m > 的字符串。名词

显然,您可以为给定输入绑定b_min_s。然而,据我所知,大多数这样做的努力都是存在的证据。例如,压缩英语维基百科正在进行一场竞赛。

This is Kolmogorov complexity, and you are correct that it's not computable. If it were, you could create a paradoxical program of length n that printed a string with Kolmogorov complexity m > n.

Clearly, you can bound b_min_s for given inputs. However, as far as I know most of the efforts to do so have been existence proofs. For instance, there is an ongoing competition to compress English Wikipedia.

花海 2024-09-16 08:35:36

Claude Shannon 估计英语语言的信息密度在每个字符 0.6 到 1.3 位之间他 1951 年的论文印刷英语的预测和熵(PDF,1.6MB。贝尔系统技术杂志(3)第 50-64 页)。

Claude Shannon estimated the information density of the English language to be somewhere between 0.6 and 1.3 bits per character in his 1951 paper Prediction and Entropy of Printed English (PDF, 1.6 MB. Bell Sys. Tech. J (3) p. 50-64).

一个人的夜不怕黑 2024-09-16 08:35:36

最大(平均)压缩率为 1:1。
可能的输入数量等于输出数量。
它必须能够将输出映射回输入。
为了能够存储输出,您需要与输入的最小容器大小相同的容器 - 提供 1:1 的压缩率。

The maximal (avarage) compression rate possible is 1:1.
The number of possible inputs is equal to the number of outputs.
It has to be to be able to map the output back to the input.
To be able to store the output you need container at the same size as the minimal container for the input - giving 1:1 compression rate.

明月夜 2024-09-16 08:35:36

基本上,您需要足够的信息来重建原始信息。我想其他答案对您的理论讨论更有帮助,但请记住这一点。

Basically, you need enough information to rebuild your original information. I guess the other answers are more helpful for your theoretical discussion, but just keep this in mind.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文