熵背景下的信息是什么?
我试图在熵的背景下理解信息的概念。首先让我介绍一些事情,以明确我所使用的术语的含义。
熵: [1]: https://en.wikipedia.org/wiki/Entropy_(information_theory)< /a>
“在信息论中,随机变量的熵是平均水平 “信息”、“惊喜”或变量可能结果固有的“不确定性”。”
\sum_{i=-1}^n - p_i * log(p_i)
所以我想到的问题是:什么是信息我们如何量化它? 现在我已经读过很多次 -log_2(p_i) (解决方案:2^x= 1/p_i)告诉我们事件 i 的概率为 p_i 有多少位信息。例如,如果我有一枚公平的硬币,那么我所拥有的反面(或正面)信息位数为 -log(0.5)=1,总熵为 H(p)=0.5 * 1 + 0.5 * 1 = 1. 这应该给出我在掷硬币时获得的平均信息量(位数)。
到目前为止,一切都很好。但如果硬币不公平怎么办?假设 p(正面)=0.1,p(反面)=0.9。根据定义我得到H(p)= 0.468996。这告诉我,在抛硬币时,我平均只能获得大约 0.47 位信息。 但是为什么会有差异?因为直观上,在这两种情况下我只得到信息,无论是正面还是反面,换句话说,零或一,即 1 位。如果我只是想获得抛硬币的结果,那么我对每个事件的概率并不真正感兴趣。让我特别困惑的是,显然头部的信息值(-log_2(0.1))比尾部的信息值(-log_2(0.9))高得多。
我能理解这些术语的唯一方法是在下面的例子中: 想象一下,您想在森林中找到蘑菇,森林分为两部分。一部分是面积的三分之一,另外三分之二,蘑菇的位置是随机的(均匀分布)。而且每个季节整个森林里只有一种蘑菇。如果某个魔法机器告诉你它在第一部分,那么对我来说,这条消息包含更多信息是有意义的,因为它有效地将你必须搜索的区域划分为 3 倍。本质是,如果你对只要知道蘑菇在森林的哪个部分,你就不会关心面积有多大(即概率有多高),它只是:它是第一部分还是第二部分。
I am trying to wrap my head around the concept of information in the context of entropy. Let me first introduce some things to make it clear what I mean with the terms I am using.
Entropy:
[1]: https://en.wikipedia.org/wiki/Entropy_(information_theory)
"In information theory, the entropy of a random variable is the average level of
"information", "surprise", or "uncertainty" inherent to the variable's possible outcomes."
\sum_{i=-1}^n - p_i * log(p_i)
So the question that came up for me was: What is information and how do we quantify it?
Now I've read a lot of times that -log_2(p_i) (Solution to: 2^x= 1/p_i) tells us how many bits of Information the event i with probability p_i has. So for example, if I have a fair coin the number of bits of information I have for tails (or heads) is -log(0.5)=1 and the total entropy is H(p)=0.5 * 1 + 0.5 * 1 = 1. This should give me the average amount of information (number of bits) I obtain when flipping the fair coin.
So far so good. But what if the coin isn't fair? Let's say p(heads)=0.1, p(tails)=0.9. According to the definition I get H(p)= 0.468996. Which tells me that on average I get only around 0.47 bits of information when flipping this coin. But why is there a difference? Since intuitively, in both cases I am only getting the information whether it's heads or tails, in other words zero or one, that's 1 bit. If I just want to obtain the result of the coin toss, I am not really interested in the probability of each event anyway. It is especially confusing for me that apparently the information value of heads (-log_2(0.1)) is much higher than that of tails (-log_2(0.9)).
The only way I can make sense of the terminology is in the following example:
Imagine you want to find a mushroom in a forest, which is split in two parts. One part is a third of the area and the other 2 thirds and the mushroom's location is random (uniformly distributed). And there is exactly one mushroom in the whole forest per season. If some magic mashine tells you that it's in the first part, it makes sense to me that this message contains more information since it effectively divides the area you have to search by a factor of 3. The essence is that if you would be satisfied with only knowing in which part of the forest the mushroom is, you wouldnt care how large the area is (i.e. how high the probability is), its just: is it the first or the second part.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这不是一个全面的答案,因为它通常采用一学期信号理论课程的形式。相反,我尝试为您提供一种方法,让您亲眼看到差异:
自己编写一个程序,使用案例 A 和案例 B 的随机数生成器生成 0 和 1 个字符的字符串。
将字符串保存到文件中,并使用您最喜欢的压缩工具(例如 ZIP 或某些运行长度)压缩这两个文件编码等)。
将压缩文件的长度与您在问题中指出的点进行比较。为什么使用案例B的文件可以获得更高的压缩率?
This is not a comprehensive answer, as that usually would have the format of a 1 semester course on signal theory. Instead, I try to give you a means to see the difference with your own eyes:
Write yourself a program, which produces a character string of 0 and 1 characters, using a random number generator for both Case A and Case B.
Save the string to a file and compress both files with your favorite compression tool (e.g. ZIP or some runlength encoding etc.).
Compare the length of the compressed files to the points you pointed out in your question. Why does the file using Case B obtain higher compression rates?