随机数种子的可能来源
两点——第一,这个例子是用 Fortran 编写的,但我认为它应该适用于任何语言;其次,内置的随机数生成器并不是真正随机的,并且存在其他生成器,但我们对将它们用于我们正在做的事情不感兴趣。
大多数关于随机种子的讨论都承认,如果程序不在运行时播种,那么种子是在编译时生成的。因此,每次程序运行时都会生成相同的数字序列,这对于随机数来说并不好。克服这个问题的一种方法是使用系统时钟为随机数生成器提供种子。
然而,当在多核机器上与 MPI 并行运行时,我们的系统时钟方法会产生相同类型的问题。虽然序列在每次运行时都会发生变化,但所有处理器都具有相同的系统时钟,因此具有相同的随机种子和相同的序列。
因此,请考虑以下示例代码:
PROGRAM clock_test
IMPLICIT NONE
INCLUDE "mpif.h"
INTEGER :: ierr, rank, clock, i, n, method
INTEGER, DIMENSION(:), ALLOCATABLE :: seed
REAL(KIND=8) :: random
INTEGER, PARAMETER :: OLD_METHOD = 0, &
NEW_METHOD = 1
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
CALL RANDOM_SEED(SIZE=n)
ALLOCATE(seed(n))
DO method = 0, 1
SELECT CASE (method)
CASE (OLD_METHOD)
CALL SYSTEM_CLOCK(COUNT=clock)
seed = clock + 37 * (/ (i - 1, i = 1, n) /)
CALL RANDOM_SEED(put=seed)
CALL RANDOM_NUMBER(random)
WRITE(*,*) "OLD Rank, dev = ", rank, random
CASE (NEW_METHOD)
OPEN(89,FILE='/dev/urandom',ACCESS='stream',FORM='UNFORMATTED')
READ(89) seed
CLOSE(89)
CALL RANDOM_SEED(put=seed)
CALL RANDOM_NUMBER(random)
WRITE(*,*) "NEW Rank, dev = ", rank, random
END SELECT
CALL MPI_BARRIER(MPI_COMM_WORLD, ierr)
END DO
CALL MPI_FINALIZE(ierr)
END PROGRAM clock_test
当在具有 2 个内核的工作站上运行时,给出:
OLD Rank, dev = 0 0.330676306089146
OLD Rank, dev = 1 0.330676306089146
NEW Rank, dev = 0 0.531503215980609
NEW Rank, dev = 1 0.747413828750221
因此,我们通过从 /dev/urandom
读取种子来克服时钟问题。这样每个核心都会获得自己的随机数。
还有哪些其他种子方法可以在多核、MPI 系统中工作,并且在每次运行时在每个核心上仍然是唯一的?
Two points -- first, the example is in Fortran, but I think it should hold for any language; second, the built in random number generators are not truly random and other generators exist, but we're not interested in using them for what we're doing.
Most discussions on random seeds acknowledge that if the program doesn't seed it at run-time, then the seed is generated at compile time. So, the same sequence of numbers is generated every time the program is run, which is not good for random numbers. One way to overcome this is to seed the random number generator with the system clock.
However, when running in parallel with MPI on a multi-core machine, the system clock approach for us generated the same kinds of problems. While the sequences changed from run to run, all processors got the same system clock and thus the same random seed and same sequences.
So consider the following example code:
PROGRAM clock_test
IMPLICIT NONE
INCLUDE "mpif.h"
INTEGER :: ierr, rank, clock, i, n, method
INTEGER, DIMENSION(:), ALLOCATABLE :: seed
REAL(KIND=8) :: random
INTEGER, PARAMETER :: OLD_METHOD = 0, &
NEW_METHOD = 1
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
CALL RANDOM_SEED(SIZE=n)
ALLOCATE(seed(n))
DO method = 0, 1
SELECT CASE (method)
CASE (OLD_METHOD)
CALL SYSTEM_CLOCK(COUNT=clock)
seed = clock + 37 * (/ (i - 1, i = 1, n) /)
CALL RANDOM_SEED(put=seed)
CALL RANDOM_NUMBER(random)
WRITE(*,*) "OLD Rank, dev = ", rank, random
CASE (NEW_METHOD)
OPEN(89,FILE='/dev/urandom',ACCESS='stream',FORM='UNFORMATTED')
READ(89) seed
CLOSE(89)
CALL RANDOM_SEED(put=seed)
CALL RANDOM_NUMBER(random)
WRITE(*,*) "NEW Rank, dev = ", rank, random
END SELECT
CALL MPI_BARRIER(MPI_COMM_WORLD, ierr)
END DO
CALL MPI_FINALIZE(ierr)
END PROGRAM clock_test
Which when run on my workstation with 2 cores, gives:
OLD Rank, dev = 0 0.330676306089146
OLD Rank, dev = 1 0.330676306089146
NEW Rank, dev = 0 0.531503215980609
NEW Rank, dev = 1 0.747413828750221
So, we overcame the clock issue by reading the seed from /dev/urandom
instead. This way each core gets its own random number.
What other seed approaches are there that will work in a multi-core, MPI system and still be unique on each core, from run to run?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您看一下 Katzgrabber 写的科学计算中的随机数:简介(这是对 ins 和(由于使用 PRNG 进行技术计算),同时他们建议使用时间和 PID 的哈希函数来生成种子。从他们的第 7.1 节来看:
当然,在 Fortran 中,这会是这样的 有时
能够传递时间也很方便,而不是从
seedgen
中调用它,这样当你测试时可以给它固定值,然后生成可重现(==可测试)的序列。If you take a look in Random Numbers In Scientific Computing: An Introduction by Katzgrabber (which is an excellent, lucid discussion of the ins and outs of using PRNGs for technical computing), in parallel they suggest using a hash function of time and PID to generate a seed. From their section 7.1:
of course, in Fortran this would be something like
It's also sometimes handy to be able to pass in the time, rather than calling it from within
seedgen
, so that when you are testing you can give it fixed values that then generate a reproducable (== testable) sequence.系统时间通常以整数类型返回(或至少容易转换为):只需将进程的等级添加到该值中,然后使用它来为随机数生成器提供种子。
System time is usually returned in (or at least easily converted into) an integer type: simply add the rank of the process to the value and use that to seed the random number generator.