数字的哪一部分具有更多的熵?

发布于 2024-07-22 07:47:14 字数 874 浏览 10 评论 0原文

给定数字序列 N1, N2, N 3... 来自某个来源,不是 PRNG,而是某种传感器或记录数据,可以安全地假设像这样处理它

Nn/ B = Qn雷姆 Mn

会导致序列 Q 的熵小于序列 M

注意:假设B使得QM具有相同大小的范围。


这与观察结果有关,真实世界数据集,无论其来源如何,都具有对数分布; 从 1 开始的数字比从 9 开始的数字更常见。但这几乎没有说明低阶部分。

为了一个有趣的方法来测试这个(并通过让你的系统管理员陷入困境来惹恼他的计算机)在 bash 中运行这个:

 ll -R 2>/dev/null | grep -v -e "^\./" | sed "s/[-rdwxlp]*\W*[0-9]*\W*[a-z]*\W*[a-z]*\W*\([0-9]\).*/\1/" | sort | uniq -c

并获取文件大小第一位数字的直方图。

Given the sequence pf numbers N1, N2, N3... from some source, not a PRNG but say sensor or logging data of some kind, is it safe to assume that processing it like this

Nn/ B = Qn Rem Mn

will result in the sequence Q haveing less entropy than the sequence M?

Note: assume that B is such that both Q and M has the same sized range.


This is related to the observation that most real world data sets, regardless or there source, have a logarithmic distribution; numbers starting in 1 are much more common than numbers starting in 9. But this says little about the low order parts.

for a fun way to test this (and piss off you sys admin by bogging down his computer) run this in bash:

 ll -R 2>/dev/null | grep -v -e "^\./" | sed "s/[-rdwxlp]*\W*[0-9]*\W*[a-z]*\W*[a-z]*\W*\([0-9]\).*/\1/" | sort | uniq -c

and get the histogram of the first digit of files sizes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

叹梦 2024-07-29 07:47:14

这取决于顺序。 例如,取 [1 * 7 = 7, 3 * 7 = 21, 6 * 7 = 42 ... (2 * N - 1) * 7] 且 B = 7。Qn 将是 [1, 3, 6, ... 2 * N - 1] 并且 Mn 将始终为 0。 通常,Q 的熵会较小,因为它就像移走了一些位,但情况并不总是这样。

当然,这尤其对于来自 (P)RNG 的数据不起作用,因为 Qn 的范围将与 Mn 的范围相同,并且对于两者来说,数字(几乎)均匀分布。

This depends on the sequence. For example, take [1 * 7 = 7, 3 * 7 = 21, 6 * 7 = 42 ... (2 * N - 1) * 7] and B = 7. Qn will be [1, 3, 6, ... 2 * N - 1] and Mn will be 0 always. Usually, entropy for Q will be less as it's like shifting some bits off, but it's not always like this.

And of course this won't work especially for data coming from a (P)RNG, as the range for Qn will be the same as the range for Mn and for both, numbers are (almost) equally distributed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文