柯尔莫哥洛夫复杂度近似算法
我正在寻找一种算法,可以计算给定输入字符串的柯尔莫哥洛夫复杂度的近似值。因此,如果 K 是字符串 S 的柯尔莫哥洛夫复杂度,并且 t 代表时间,那么该函数的行为将类似于以下所示: limit(t->inf)[K_approx(t,S)] = K。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
理论上,当运行时间接近无穷大时,程序可以收敛于其输入字符串的柯尔莫哥洛夫复杂度。它可以通过并行运行每个可能的程序(输入字符串的长度或更短)来工作。当找到给定长度的节目时,该长度被识别为目前已知的最小长度,被打印,并且不再尝试>=该长度的节目。该算法(很可能)将永远运行,打印越来越短的长度,在给定无限时间的情况下收敛于精确的柯尔莫哥洛夫复杂度。
当然,运行指数数量的程序是非常困难的。更有效的算法是在 StackOverflow 上发布代码高尔夫 。一些缺点:
In theory, a program could converge on the Kolmogorov complexity of its input string as the running time approaches infinity. It could work by running every possible program in parallel that is the length of the input string or shorter. When a program of a given length is found, that length is identified as the minimum length known for now, is printed, and no more programs >= that length are tried. This algorithm will (most likely) run forever, printing shorter and shorter lengths, converging on the exact Kolmogorov complexity given infinite time.
Of course, running an exponential number of programs is highly intractible. A more efficient algorithm is to post a code golf on StackOverflow. A few drawbacks:
我想这可能有用吗?如果有人发现错误,请指出。
I think this might work? If somebody sees an error, please point it out.
看来雷·所罗门诺夫在这个领域做了很多工作。
雷·所罗门诺夫的出版物
归纳推理理论 - 解决模式识别和人工智能问题的统一方法。
算法概率能解决归纳问题吗?
It looks like Ray Solomonoff did a lot of work in this field.
Publications of Ray Solomonoff
Inductive Inference Theory - A Unified Approach to Problems in Pattern Recognition and Artificial Intelligence.
Does Algorithmic Probability Solve the Problem of Induction?
柯尔莫哥洛夫复杂度的维基百科页面有一个标题为“柯尔莫哥洛夫复杂度的不可计算性”的小节,位于“基本结果”部分。这并不是一个可以有效计算甚至近似的基本度量。
毫无疑问,有更好的方法可以实现您想要的目标。如果您想要衡量随机性,您可以尝试二元熵函数。标准算法之一的可压缩性也可能符合要求。
The wikipedia page for Kolmogorov complexity has a subsection entitled "Incomputability of Kolmogorov complexity", under the "Basic results" section. This is not intended to be a basic measure that you can compute, or even approximate productively.
There are better ways of achieving what you want, without a doubt. If a measure of randomness is what you want, you could try the binary entropy function. Compressibility by one of the standard algorithms might also fit the bill.
我注意到的第一个问题是“柯尔莫哥洛夫复杂性”没有明确定义。这在某种程度上取决于如何表示程序的选择。因此,您需要做的第一件事是修复程序的某些编码(例如,Joey Adams 的程序用 J 编写的规范)。
一旦你有了编码,你正在寻找的算法就非常简单了。请参阅乔伊的回答。
但情况比必须运行指数级数量的程序还要糟糕。这些程序中的每一个都可以像您想象的那样运行(从技术上讲:作为函数输入大小的运行时间可能比任何递归函数增长得更快)。更重要的是,有些最短的程序可能运行时间最长。因此,虽然并行方法会随着时间趋向无穷大而接近正确值,但其速度会慢得难以想象。
您可以提前停止程序,认为此时的近似值已经足够好。但是,您通常不知道该近似值有多好。事实上,有些定理表明你永远无法知道。
所以简短的答案是“简单,只需使用乔伊的算法”,但无论从实用性的角度来看,答案是“你没有机会”。正如 rwong 所建议的,您最好只使用重型压缩算法。
The first issue that I notice is that "the Kolmogorov Complexity" isn't well defined. It depends to some degree on the choice of how to represent programs. So, the first thing you would need to do is fix some encoding of programs (for example, Joey Adams' specification that programs be written in J).
Once you have the encoding, the algorithm you are looking for is quite simple. See Joey's answer for that.
But the situation is even worse than having to run exponentially many programs. Each of those programs could run as long as you could possibly imagine (technically: running time as a function input size could grow faster than any recursive function). What's more, it could be the case that some of the shortest programs are the ones that run the longest. So while the parallel approach will approach the correct value as time goes to infinity, it will do so unimaginably slowly.
You could stop the program prematurely, figuring that the approximation at that point is good enough. However, you have no idea in general how good that approximation is. In fact, there are theorems that show you can never know.
So the short answer is "easy, just use Joey's algorithm", but by any measure of practicality, the answer is, "you don't have a chance". As has been recommended by rwong, you are better off just using a heavy-duty compression algorithm.