如何计算位串的近似熵？

发布于 2024-09-04 06:59:16 字数 494 浏览 14 评论 0原文

有没有标准的方法来做到这一点？

谷歌搜索 -- “近似熵”位 --揭示了多篇学术论文，但我只想找到一段伪代码，定义任意长度的给定位串的近似熵。

（如果这说起来容易做起来难，并且取决于应用程序，我的应用程序涉及 16,320 位加密数据（密文）。但加密是一个谜题，并不意味着不可能破解。我想我应该首先检查熵，但很难找到一个好的定义，所以这似乎是一个应该出现在 StackOverflow 上的问题！也欢迎从哪里开始解密 16k 看似随机的位......）

另请参阅此相关问题：
熵的计算机科学定义是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

天涯离梦残月幽梦 2024-09-11 06:59:16

熵不是您获得的字符串的属性，而是您本来可以获得的字符串的属性。换句话说，它限定了生成字符串的进程。

在简单的情况下，您会从一组 N 个可能的字符串中获得一个字符串，其中每个字符串被选择的概率与其他字符串相同，即 1/N。在这种情况下，字符串的熵为 N。熵通常以位表示，这是一个对数标度：“n位”的熵等于2ⁿ的熵。

例如：我喜欢将密码生成为两个小写字母，然后是两个数字，然后是两个小写字母，最后是两个数字（例如 va85mw24）。字母和数字是随机、统一且彼此独立选择的。此过程可能会产生 26*26*10*10*26*26*10*10 = 4569760000 个不同的密码，并且所有这些密码被选择的机会均等。这样一个密码的熵就是 4569760000，这意味着大约 32.1 位。

回复收藏 0 原文

呆 2024-09-11 06:59:16

香农熵方程是标准计算方法。这是一个简单的 Python 实现，无耻地从 Revelation 代码库复制而来，因此获得了 GPL 许可

import math


def entropy(string):
    "Calculates the Shannon entropy of a string"

    # get probability of chars in string
    prob = [ float(string.count(c)) / len(string) for c in dict.fromkeys(list(string)) ]

    # calculate the entropy
    entropy = - sum([ p * math.log(p) / math.log(2.0) for p in prob ])

    return entropy


def entropy_ideal(length):
    "Calculates the ideal Shannon entropy of a string with given length"

    prob = 1.0 / length

    return -1.0 * length * prob * math.log(prob) / math.log(2.0)

：此实现假设您的输入比特流最好表示为字节。对于您的问题域来说，这可能是也可能不是。你真正想要的是你的比特流转换成一串数字。您如何决定这些数字是特定于领域的。如果您的数字确实只是一和零，那么请将您的比特流转换为一和零的数组。但是，您选择的转换方法将影响您获得的结果。

Shannon's entropy equation is the standard method of calculation. Here is a simple implementation in Python, shamelessly copied from the Revelation codebase, and thus GPL licensed:

import math


def entropy(string):
    "Calculates the Shannon entropy of a string"

    # get probability of chars in string
    prob = [ float(string.count(c)) / len(string) for c in dict.fromkeys(list(string)) ]

    # calculate the entropy
    entropy = - sum([ p * math.log(p) / math.log(2.0) for p in prob ])

    return entropy


def entropy_ideal(length):
    "Calculates the ideal Shannon entropy of a string with given length"

    prob = 1.0 / length

    return -1.0 * length * prob * math.log(prob) / math.log(2.0)

Note that this implementation assumes that your input bit-stream is best represented as bytes. This may or may not be the case for your problem domain. What you really want is your bitstream converted into a string of numbers. Just how you decide on what those numbers are is domain specific. If your numbers really are just one and zeros, then convert your bitstream into an array of ones and zeros. The conversion method you choose will affect the results you get, however.

回复收藏 0 原文

篱下浅笙歌 2024-09-11 06:59:16

我相信答案是字符串的Kolmogorov 复杂度。
这不仅不能用一大堆伪代码来回答，而且柯尔莫哥洛夫复杂度也不是一个可计算函数！

在实践中您可以做的一件事是使用最佳的可用数据压缩算法来压缩位字符串。
压缩得越多，熵就越低。

回复收藏 0 原文

为人所爱 2024-09-11 06:59:16

NIST 随机数生成器评估工具包有一种计算“近似熵”的方法。这是简短的描述：

近似熵测试说明：这个测试的重点是
每个重叠 m 位模式的频率。目的
该测试是比较两个块重叠的频率
与预期结果相对应的连续/相邻长度（m 和 m+1）
为随机序列。

更全面的解释可以从 PDF 在此页面上：

http://csrc.nist.gov/groups/ ST/toolkit/rng/documentation_software.html

回复收藏 0 原文

-残月青衣踏尘吟 2024-09-11 06:59:16

没有单一的答案。熵总是与某些模型相关。当有人谈论具有有限熵的密码时，他们的意思是“相对于智能攻击者的预测能力”，并且它始终是一个上限。

你的问题是，你试图测量熵来帮助你找到一个模型，但这是不可能的；熵测量可以告诉您模型有多好。

话虽如此，您可以尝试一些相当通用的模型；它们被称为压缩算法。如果 gzip 可以很好地压缩您的数据，那么您至少找到了一种可以很好地预测数据的模型。例如，gzip 对简单替换大多不敏感。它可以像处理“the”一样轻松地处理文本中频繁出现的“wkh”。

回复收藏 0 原文

半窗疏影 2024-09-11 06:59:16

通过以下公式使用单词的香农熵： https://i.sstatic.net/GBBJG.jpg< /a>

这是计算它的 O(n) 算法：

import math
from collections import Counter


def entropy(s):
    l = float(len(s))
    return -sum(map(lambda a: (a/l)*math.log2(a/l), Counter(s).values()))

Using Shannon entropy of a word with this formula : https://i.sstatic.net/GBBJG.jpg

Here's a O(n) algorithm that calculates it :

import math
from collections import Counter


def entropy(s):
    l = float(len(s))
    return -sum(map(lambda a: (a/l)*math.log2(a/l), Counter(s).values()))

回复收藏 0 原文

阪姬 2024-09-11 06:59:16

下面是 Python 中的一个实现（我也将其添加到了 Wiki 页面）：

import numpy as np

def ApEn(U, m, r):

    def _maxdist(x_i, x_j):
        return max([abs(ua - va) for ua, va in zip(x_i, x_j)])

    def _phi(m):
        x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)]
        C = [len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x]
        return -(N - m + 1.0)**(-1) * sum(np.log(C))

    N = len(U)

    return _phi(m) - _phi(m + 1)

示例：

>>> U = np.array([85, 80, 89] * 17)
>>> ApEn(U, 2, 3)
-1.0996541105257052e-05

上面的示例与维基百科上给出的示例。

Here's an implementation in Python (I also added it to the Wiki page):

import numpy as np

def ApEn(U, m, r):

    def _maxdist(x_i, x_j):
        return max([abs(ua - va) for ua, va in zip(x_i, x_j)])

    def _phi(m):
        x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)]
        C = [len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x]
        return -(N - m + 1.0)**(-1) * sum(np.log(C))

    N = len(U)

    return _phi(m) - _phi(m + 1)

Example:

>>> U = np.array([85, 80, 89] * 17)
>>> ApEn(U, 2, 3)
-1.0996541105257052e-05

The above example is consistent with the example given on Wikipedia.

回复收藏 0 原文

~没有更多了~

关于作者

只怪假的太真实

暂无简介

0 文章

0 评论

25 人气

关注发私信

友情链接

文江博客

如何计算位串的近似熵？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

小瓶盖

wxsp_Ukbq8xGR

1638627670

仅一夜美梦

夜访吸血鬼

近卫軍团

友情链接

如何计算位串的近似熵？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

小瓶盖

wxsp_Ukbq8xGR

1638627670

仅一夜美梦

夜访吸血鬼

近卫軍团

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。