熵背景下的信息是什么？

发布于 2025-01-16 01:27:14 字数 1044 浏览 5 评论 0原文

我试图在熵的背景下理解信息的概念。首先让我介绍一些事情，以明确我所使用的术语的含义。

熵： [1]: https://en.wikipedia.org/wiki/Entropy_(information_theory)< /a>

“在信息论中，随机变量的熵是平均水平 “信息”、“惊喜”或变量可能结果固有的“不确定性”。”

\sum_{i=-1}^n - p_i * log(p_i)

所以我想到的问题是：什么是信息我们如何量化它？现在我已经读过很多次 -log_2(p_i) （解决方案：2^x= 1/p_i）告诉我们事件 i 的概率为 p_i 有多少位信息。例如，如果我有一枚公平的硬币，那么我所拥有的反面（或正面）信息位数为 -log(0.5)=1，总熵为 H(p)=0.5 * 1 + 0.5 * 1 = 1. 这应该给出我在掷硬币时获得的平均信息量（位数）。

到目前为止，一切都很好。但如果硬币不公平怎么办？假设 p(正面)=0.1，p(反面)=0.9。根据定义我得到H(p)= 0.468996。这告诉我，在抛硬币时，我平均只能获得大约 0.47 位信息。 但是为什么会有差异？因为直观上，在这两种情况下我只得到信息，无论是正面还是反面，换句话说，零或一，即 1 位。如果我只是想获得抛硬币的结果，那么我对每个事件的概率并不真正感兴趣。让我特别困惑的是，显然头部的信息值（-log_2(0.1)）比尾部的信息值（-log_2(0.9)）高得多。

我能理解这些术语的唯一方法是在下面的例子中：想象一下，您想在森林中找到蘑菇，森林分为两部分。一部分是面积的三分之一，另外三分之二，蘑菇的位置是随机的（均匀分布）。而且每个季节整个森林里只有一种蘑菇。如果某个魔法机器告诉你它在第一部分，那么对我来说，这条消息包含更多信息是有意义的，因为它有效地将你必须搜索的区域划分为 3 倍。本质是，如果你对只要知道蘑菇在森林的哪个部分，你就不会关心面积有多大（即概率有多高），它只是：它是第一部分还是第二部分。

原文

I am trying to wrap my head around the concept of information in the context of entropy. Let me first introduce some things to make it clear what I mean with the terms I am using.

Entropy:
[1]: https://en.wikipedia.org/wiki/Entropy_(information_theory)

"In information theory, the entropy of a random variable is the average level of
"information", "surprise", or "uncertainty" inherent to the variable's possible outcomes."

\sum_{i=-1}^n - p_i * log(p_i)

So the question that came up for me was: What is information and how do we quantify it?
Now I've read a lot of times that -log_2(p_i) (Solution to: 2^x= 1/p_i) tells us how many bits of Information the event i with probability p_i has. So for example, if I have a fair coin the number of bits of information I have for tails (or heads) is -log(0.5)=1 and the total entropy is H(p)=0.5 * 1 + 0.5 * 1 = 1. This should give me the average amount of information (number of bits) I obtain when flipping the fair coin.

So far so good. But what if the coin isn't fair? Let's say p(heads)=0.1, p(tails)=0.9. According to the definition I get H(p)= 0.468996. Which tells me that on average I get only around 0.47 bits of information when flipping this coin. But why is there a difference? Since intuitively, in both cases I am only getting the information whether it's heads or tails, in other words zero or one, that's 1 bit. If I just want to obtain the result of the coin toss, I am not really interested in the probability of each event anyway. It is especially confusing for me that apparently the information value of heads (-log_2(0.1)) is much higher than that of tails (-log_2(0.9)).

The only way I can make sense of the terminology is in the following example:
Imagine you want to find a mushroom in a forest, which is split in two parts. One part is a third of the area and the other 2 thirds and the mushroom's location is random (uniformly distributed). And there is exactly one mushroom in the whole forest per season. If some magic mashine tells you that it's in the first part, it makes sense to me that this message contains more information since it effectively divides the area you have to search by a factor of 3. The essence is that if you would be satisfied with only knowing in which part of the forest the mushroom is, you wouldnt care how large the area is (i.e. how high the probability is), its just: is it the first or the second part.

分享到QQ

分享到微博