重复播种随机数生成器是合理的哈希函数吗?
我希望生成大量随机数据,这些数据对于给定的key
是可重现的,其中包含数字列表:
[a, b, c, d, e, ...]
以下是让 RNG 进入生成随机状态的良好或明智的方法吗数据,对于每个 n 元组 [a, b, c, ..., n]
,该数据与“相邻”n 元组 的输出不相关>[a+1,b, c, ..., n]
、[a, b+1, c, ..., n]
等。
srand(a);
srand(rand() * b);
srand(rand() * c);
...
srand(rand() * n);
# generate random data:
for (int i=0; i < 100; +i)
printf("%d", rand());
我认为这个问题归结为以下几点:是 < code>rand_hash 是一个好的二元组(a, b)
哈希函数吗?
int rand_hash(int a, int b) {
srand(a);
srand(rand() * b);
return rand();
}
注意:我不想暗示 srand 和 rand 是 RNG 的任何特定实现。为了便于论证,假设我们使用的是良好的 Mersenne Twister 代码。
编辑:如果不清楚,“合理的哈希函数”我的意思如下。在 2 元组 [a, b]
的限制情况下,rand_hash
的输出应该在 int
范围内一致,并且(通常)a
或 b
的变化幅度与返回值的变化幅度之间不应该存在相关性。
I wish to generate a large amount of random data, which is reproducible for a given key
, comprising a list of numbers:
[a, b, c, d, e, ...]
Is the following a good or sensible way to get a RNG into a state to generate random data, in such a way that for each n-tuple [a, b, c, ..., n]
, that data is uncorrelated with the output for the "adjacent" n-tuples [a+1, b, c, ..., n]
, [a, b+1, c, ..., n]
, etc.
srand(a);
srand(rand() * b);
srand(rand() * c);
...
srand(rand() * n);
# generate random data:
for (int i=0; i < 100; +i)
printf("%d", rand());
I think this question boils down to the following: is rand_hash
a good hash function for the 2-tuple (a, b)
?
int rand_hash(int a, int b) {
srand(a);
srand(rand() * b);
return rand();
}
NB: I don't wish to imply that srand
and rand
are any particular implementation of an RNG. Assume for the sake of argument that we're using a good Mersenne Twister code.
Edit: If it isn't clear, by "reasonable hash function" I mean the following. In the restricted case of a 2-tuple [a, b]
, then the output of rand_hash
should be uniform over the range of int
, and (typically) there should be no correlation between the magnitude in the change of a
or b
and the magnitude of the change in the return value.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不,这不是一个合理的方法。
rand
的实现是什么。随机数生成器被设计为在几个生成的 m 个数字的周期内提供近似均匀分布的数字。它们的设计目的不是在一组(32 位)种子上提供均匀分布的数字。在您假设的 mersenne_twister 情况下,随机数生成器的状态远大于您提供给 srand 的整数(具体来说,624*sizeof(int) ) >)。 RNG 确保其输出随机且均匀的大部分能力都来自该附加状态,而您却将其拿走了。 (种子只能是 2^32 状态之一)rand
是一个黑匣子,没有人会说明天的实现与今天的实现相匹配)。srand
的相同输入,哈希函数的输出返回相同的内容。因此,您已经有了一个哈希值——srand
的输入。 RNG 对于 srand 的给定输入生成相同的输出。因此,您可能获得的哈希值数量不会大于仅返回您已经计算出的哈希值。如果 srand 中的初始哈希值对于哈希表来说分布很差,则适当地缩放哈希值,使其在表中表现良好。对于
rand
的某些实现,其性能非常差。考虑一个线性同余生成器(这在 C 库中更常见,因为它的状态为sizeof(int)
-- 例如 BSD 生成器)。 LCG 遵循 xNext = a*xCurrent + b 的形式。考虑:请注意,这种(常见)类型的生成器生成的哈希值很容易与您的输入值相关。
No, this is not a reasonable approach.
rand
is. Random number generators are designed to provide approximately uniformly distributed numbers over a period of several generated mnumbers. They are not designed to provide uniformly distributed numbers over the set of (32 bit) seeds. In your hypotheticalmersenne_twister
case, the random number generator has state much larger than the integer you supply tosrand
(specifically,624*sizeof(int)
). Most of the power the RNG has to ensure its output is random and uniform are from that additional state, and you took that away. (The seed can be only one of 2^32 states)rand
is a black box, nobody says that tomorrow's implementation matches today's).srand
. Therefore, you already have a hash -- the input tosrand
. The RNG generates the same output for a given input tosrand
. Therefore the number of hashes you may obtain is no greater than just returning the hash you would have already calculated. If your initial hash into srand is of poor distribution for a hash table, then scale the hash appropriately such that it performs well in your table.For some implementations of
rand
, this performs extremely poorly. Consider a linear congruential generator (which is more common with C libraries because it has state ofsizeof(int)
-- e.g. the BSD generator ). A LCG follows the formxNext = a*xCurrent + b
. Consider:Note that this (common) type of generator produces hash values easily correlated to your input values.
使用
boost::hash_combine
http 怎么样? ://www.boost.org/doc/libs/1_33_1/doc/html/hash_combine.html 创建您的初始种子?多次使用 srand 总是会在我的脑海中触发危险信号。What about using
boost::hash_combine
http://www.boost.org/doc/libs/1_33_1/doc/html/hash_combine.html to create your initial seed? Usingsrand
more than once always triggers red flags in my mind.潜在问题:
如果另一个线程在哈希函数中间调用
rand()
怎么办?Potential problem:
What if another thread calls
rand()
in the middle of your hash function?