数字的哪一部分具有更多的熵？

发布于 2024-07-22 07:47:14 字数 874 浏览 16 评论 0原文

给定数字序列 N₁, N₂, N₃... 来自某个来源，不是 PRNG，而是某种传感器或记录数据，可以安全地假设像这样处理它

N_n/ B = Q_n雷姆 M_n

会导致序列 Q 的熵小于序列 M？

注意：假设B使得Q和M具有相同大小的范围。

这与观察结果有关，最真实世界数据集，无论其来源如何，都具有对数分布；从 1 开始的数字比从 9 开始的数字更常见。但这几乎没有说明低阶部分。

为了一个有趣的方法来测试这个（并通过让你的系统管理员陷入困境来惹恼他的计算机）在 bash 中运行这个：

 ll -R 2>/dev/null | grep -v -e "^\./" | sed "s/[-rdwxlp]*\W*[0-9]*\W*[a-z]*\W*[a-z]*\W*\([0-9]\).*/\1/" | sort | uniq -c

并获取文件大小第一位数字的直方图。

原文

Given the sequence pf numbers N₁, N₂, N₃... from some source, not a PRNG but say sensor or logging data of some kind, is it safe to assume that processing it like this

N_n/ B = Q_n Rem M_n

will result in the sequence Q haveing less entropy than the sequence M?

Note: assume that B is such that both Q and M has the same sized range.

This is related to the observation that most real world data sets, regardless or there source, have a logarithmic distribution; numbers starting in 1 are much more common than numbers starting in 9. But this says little about the low order parts.

for a fun way to test this (and piss off you sys admin by bogging down his computer) run this in bash:

 ll -R 2>/dev/null | grep -v -e "^\./" | sed "s/[-rdwxlp]*\W*[0-9]*\W*[a-z]*\W*[a-z]*\W*\([0-9]\).*/\1/" | sort | uniq -c

and get the histogram of the first digit of files sizes.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

叹梦 2024-07-29 07:47:14

这取决于顺序。例如，取 [1 * 7 = 7, 3 * 7 = 21, 6 * 7 = 42 ... (2 * N - 1) * 7] 且 B = 7。Qn 将是 [1, 3, 6, ... 2 * N - 1] 并且 Mn 将始终为 0。通常，Q 的熵会较小，因为它就像移走了一些位，但情况并不总是这样。

当然，这尤其对于来自 (P)RNG 的数据不起作用，因为 Qn 的范围将与 Mn 的范围相同，并且对于两者来说，数字（几乎）均匀分布。

回复收藏 0 原文

~没有更多了~