数字的哪一部分具有更多的熵?
给定数字序列 N
1, N
2, N
3...
来自某个来源,不是 PRNG,而是某种传感器或记录数据,可以安全地假设像这样处理它
N
n
/ B = Q
n雷姆 M
n
会导致序列 Q
的熵小于序列 M
?
注意:假设B
使得Q
和M
具有相同大小的范围。
这与观察结果有关,最真实世界数据集,无论其来源如何,都具有对数分布; 从 1 开始的数字比从 9 开始的数字更常见。但这几乎没有说明低阶部分。
为了一个有趣的方法来测试这个(并通过让你的系统管理员陷入困境来惹恼他的计算机)在 bash 中运行这个:
ll -R 2>/dev/null | grep -v -e "^\./" | sed "s/[-rdwxlp]*\W*[0-9]*\W*[a-z]*\W*[a-z]*\W*\([0-9]\).*/\1/" | sort | uniq -c
并获取文件大小第一位数字的直方图。
Given the sequence pf numbers N
1, N
2, N
3...
from some source, not a PRNG but say sensor or logging data of some kind, is it safe to assume that processing it like this
N
n
/ B = Q
nRem M
n
will result in the sequence Q
haveing less entropy than the sequence M
?
Note: assume that B
is such that both Q
and M
has the same sized range.
This is related to the observation that most real world data sets, regardless or there source, have a logarithmic distribution; numbers starting in 1 are much more common than numbers starting in 9. But this says little about the low order parts.
for a fun way to test this (and piss off you sys admin by bogging down his computer) run this in bash:
ll -R 2>/dev/null | grep -v -e "^\./" | sed "s/[-rdwxlp]*\W*[0-9]*\W*[a-z]*\W*[a-z]*\W*\([0-9]\).*/\1/" | sort | uniq -c
and get the histogram of the first digit of files sizes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这取决于顺序。 例如,取 [1 * 7 = 7, 3 * 7 = 21, 6 * 7 = 42 ... (2 * N - 1) * 7] 且 B = 7。Qn 将是 [1, 3, 6, ... 2 * N - 1] 并且 Mn 将始终为 0。 通常,Q 的熵会较小,因为它就像移走了一些位,但情况并不总是这样。
当然,这尤其对于来自 (P)RNG 的数据不起作用,因为 Qn 的范围将与 Mn 的范围相同,并且对于两者来说,数字(几乎)均匀分布。
This depends on the sequence. For example, take [1 * 7 = 7, 3 * 7 = 21, 6 * 7 = 42 ... (2 * N - 1) * 7] and B = 7. Qn will be [1, 3, 6, ... 2 * N - 1] and Mn will be 0 always. Usually, entropy for Q will be less as it's like shifting some bits off, but it's not always like this.
And of course this won't work especially for data coming from a (P)RNG, as the range for Qn will be the same as the range for Mn and for both, numbers are (almost) equally distributed.