香农熵公式。 帮助我的困惑
我对熵公式的理解是,它用于计算表示某些数据所需的最小位数。 定义时通常用不同的措辞,但之前的理解是我到现在为止所依赖的。
这是我的问题。 假设我有一个由 100 个“1”后跟 100 个“0”组成的序列 = 200 位。 字母表是{0,1},熵的基数是2。符号“0”的概率是0.5,“1”的概率是0.5。 所以熵就是1或者1bit来表示1bit。
但是,您可以使用 100 / 1 / 100 / 0 之类的内容对其进行游程编码,其中它是要输出的位数,后跟该位。 看来我的表示比数据小。 特别是如果您将 100 增加到更大的数字。
我正在使用: http://en.wikipedia.org/wiki/Information_entropy 作为参考眼下。 我哪里做错了? 是分配给符号的概率吗? 我不认为这是错误的。 或者我是否错误地理解了压缩和熵之间的联系? 还要别的吗?
谢谢。
编辑
根据一些答案,我的后续行动是:您会将熵公式应用于消息的特定实例以尝试找出其信息内容吗? 接受消息“aaab”并说熵为 ~0.811 是否有效? 如果是,那么 1...10....0 的熵是多少,其中使用熵公式将 1 和 0 重复 n 次。 答案是1吗?
是的,我知道您正在创建输入符号的随机变量,并根据您的消息猜测概率质量函数。 我想确认的是熵公式没有考虑消息中符号的位置。
my understanding of the entropy formula is that it's used to compute the minimum number of bits required to represent some data. It's usually worded differently when defined, but the previous understanding is what I relied on until now.
Here's my problem. Suppose I have a sequence of 100 '1' followed by 100 '0' = 200 bits. The alphabet is {0,1}, base of entropy is 2. Probability of symbol "0" is 0.5 and "1" is 0.5. So the entropy is 1 or 1 bit to represent 1 bit.
However you can run-length encode it with something like 100 / 1 / 100 / 0 where it's number of bits to output followed by the bit. It seems like I have a representation smaller than the data. Especially if you increase the 100 to much larger number.
I'm using: http://en.wikipedia.org/wiki/Information_entropy as reference at the moment.
Where did I go wrong? Is it the probability assigned to symbols? I don't think it's wrong. Or did I get the connection between compression and entropy wrong? Anything else?
Thanks.
Edit
Following some of the answers my followup are: would you apply the entropy formula to a particular instance of a message to try to find out its information content? Would it be valid to take the message "aaab" and say the entropy is ~0.811. If yes then what's the entropy of 1...10....0 where 1s and 0s are repeated n times using the entropy formula. Is the answer 1?
Yes I understand that you are creating a random variable of your input symbols and guessing at the probability mass function based on your message. What I'm trying to confirm is the entropy formula does not take into account the position of the symbols in the message.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
你已经很接近了,但最后一个问题是错误所在。 如果您能够将某些内容压缩成比其原始表示形式更小的形式,则意味着原始表示形式至少有一些冗余。 消息中的每一位实际上并没有传达 1 位信息。
因为冗余数据不会对消息的信息内容做出贡献,所以它也不会增加其熵。 例如,想象一个仅返回值“0”的“随机位生成器”。 这根本没有传达任何信息! (实际上,它传达了未定义的信息量,因为任何仅包含一种符号的二进制消息都需要在熵公式中除以零。)
相比之下,如果您模拟大量随机抛硬币,很难大幅减少该消息的大小。 每一位都会贡献接近 1 位的熵。
当你压缩数据时,你就提取了冗余。 作为交换,您必须设计一种知道如何压缩和解压缩该数据的方案,从而付出一次性熵的代价; 这本身需要一些信息。
总而言之,您可以设计一种方案来使数据编码小于原始数据,这一事实告诉您一些重要的事情。 也就是说,它表示您的原始数据包含的信息非常少。
进一步阅读
要对此进行更彻底的处理,包括如何通过一些示例计算任意数字序列的熵,请查看 这份简短的白皮书。
You're pretty close, but this last question is where the mistake was. If you're able to compress something into a form that was smaller than its original representation, it means that the original representation had at least some redundancy. Each bit in the message really wasn't conveying 1 bit of information.
Because redundant data does not contribute to the information content of a message, it also does not increase its entropy. Imagine, for example, a "random bit generator" that only returns the value "0". This conveys no information at all! (Actually, it conveys an undefined amount of information, because any binary message consisting of only one kind of symbol requires a division by zero in the entropy formula.)
By contrast, had you simulated a large number of random coin flips, it would be very hard to reduce the size of this message by much. Each bit would be contributing close to 1 bit of entropy.
When you compress data, you extract that redundancy. In exchange, you pay a one-time entropy price by having to devise a scheme that knows how to compress and decompress this data; that itself takes some information.
To summarize, the fact that you could devise a scheme to make the encoding of the data smaller than the original data tells you something important. Namely, it says that your original data contained very little information.
Further reading
For a more thorough treatment of this, including exactly how you'd calculate the entropy for any arbitrary sequence of digits with a few examples, check out this short whitepaper.
看看柯尔莫哥洛夫复杂度
在您的特定情况下,不要将自己限制于字母表 {0,1}。 对于您的示例,请使用 {0...0, 1...1} (数百个 0 和数百个 1)
Have a look at Kolmogorov complexity
And in your particular case, don't restrict yourself to alphabet {0,1}. For your example use {0...0, 1...1} (hundred of 0's and hundred of 1's)
您的编码在此示例中有效,但可以设想一个同样有效的情况: 010101010101... 它将被编码为 1 / 0 / 1 / 1 / ...
熵是在可以构造的所有可能消息中测量的给定的字母表,而不仅仅是病态的例子!
Your encoding works in this example, but it is possible to conceive an equally valid case: 010101010101... which would be encoded as 1 / 0 / 1 / 1 / ...
Entropy is measured across all possible messages that can be constructed in the given alphabet, and not just pathological examples!
约翰·费米内拉说得对,但我认为还有更多要说的。
香农熵是基于概率的,而概率总是情人眼里出西施。
您说过 1 和 0 的可能性相同 (0.5)。 如果是这样,那么 100 个 1 后跟 100 个 0 组成的字符串的概率为 0.5^200,其中 -log(base 2) 是 200 位,正如您所期望的那样。 然而,该字符串的熵(用香农术语来说)是它的信息内容乘以它的概率,或者 200 * 0.5^200,仍然是一个非常小的数字。
这很重要,因为如果您进行游程长度编码来压缩字符串,则对于该字符串,它将获得较小的长度,但对所有 2^200 个字符串进行平均,效果不佳。 如果运气好的话,平均数会达到 200 左右,但也不少于。
另一方面,如果你看看你的原始字符串并说它是如此引人注目,以至于无论谁生成它都可能生成更多类似的字符串,那么你实际上是在说它的概率大于 0.5^200,所以你正在做一个不同的关于字符串生成器的原始概率结构的假设,即它的熵低于 200 位。
就我个人而言,我发现这个主题非常有趣,特别是当您研究柯尔莫哥洛夫(算法)信息时。 在这种情况下,您可以将字符串的信息内容定义为可以生成该字符串的最小程序的长度。 这导致了对软件工程和语言设计的各种见解。
我希望这对您有所帮助,并感谢您的提问。
John Feminella got it right, but I think there is more to say.
Shannon entropy is based on probability, and probability is always in the eye of the beholder.
You said that 1 and 0 were equally likely (0.5). If that is so, then the string of 100 1s followed by 100 0s has a probability of 0.5^200, of which -log(base 2) is 200 bits, as you expect. However, the entropy of that string (in Shannon terms) is its information content times its probability, or 200 * 0.5^200, still a really small number.
This is important because if you do run-length coding to compress the string, in the case of this string it will get a small length, but averaged over all 2^200 strings, it will not do well. With luck, it will average out to about 200, but not less.
On the other hand, if you look at your original string and say it is so striking that whoever generated it is likely to generate more like it, then you are really saying its probability is larger than 0.5^200, so you are making a different assumptions about the original probability structure of the generator of the string, namely that it has lower entropy than 200 bits.
Personally, I find this subject really interesting, especially when you look into Kolmogorov (Algorithmic) information. In that case, you define the information content of a string as the length of the smallest program that could generate it. This leads to all sorts of insights into software engineering and language design.
I hope that helps, and thanks for your question.