除了使用时间来生成随机数之外,还有其他选择吗?

发布于 2024-12-07 15:35:17 字数 189 浏览 2 评论 0原文

我试图在计算集群中同时运行一段代码的多个实例(2000 个实例左右)。它的工作方式是,我提交作业,集群将在节点经常打开时运行它们,每个节点有多个作业。这似乎在使用时间种子的随机数生成中为大量实例生成相同的值。

我可以使用一个简单的替代方案吗?可重复性和安全性并不重要,重要的是快速生成独特的种子。最简单的方法是什么,如果可能的话,跨平台方法会很好。

I'm trying to run several instances of a piece of code (2000 instances or so) concurrently in a computing cluster. The way it works is that I submit the jobs and the cluster will run them as nodes open up every so often, with several jobs per node. This seems to produce the same values for a good number of the instances in their random number generation, which uses a time-seed.

Is there a simple alternative I can use instead? Reproducibility and security are not important, quick generation of unique seeds is. What would be the simplest approach to this, and if possible a cross platform approach would be good.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

楠木可依 2024-12-14 15:35:17

rdtsc 指令是一个非常可靠(且随机)的种子。

在 Windows 中,可以通过 __rdtsc() 内部函数访问它。

在 GNU C 中,可通过以下方式访问:

unsigned long long rdtsc(){
    unsigned int lo,hi;
    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
    return ((unsigned long long)hi << 32) | lo;
}

该指令测量自处理器开机以来的总伪周期。鉴于当今机器的高频率,即使两个处理器同时启动且时钟速度相同,它们也极不可能返回相同的值。

The rdtsc instruction is a pretty reliable (and random) seed.

In Windows it's accessible via the __rdtsc() intrinsic.

In GNU C, it's accessible via:

unsigned long long rdtsc(){
    unsigned int lo,hi;
    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
    return ((unsigned long long)hi << 32) | lo;
}

The instruction measures the total pseudo-cycles since the processor was powered on. Given the high frequency of today's machines, it's extremely unlikely that two processors will return the same value even if they booted at the same time and are clocked at the same speed.

迎风吟唱 2024-12-14 15:35:17

我假设您有一些进程启动其他进程。让它传入种子以供使用。然后,您可以让主进程为每个进程传递一个随机数以用作其种子。这样一来,实际上就只选择了一个任意种子......您可以为此花费时间。

如果您没有启动其他进程的主进程,那么如果每个进程至少有一个唯一的索引,那么您可以做的是让一个进程在内存(如果共享内存)或文件中生成一系列随机数(如果是共享磁盘)然后让每个进程拉出第一个索引的随机数作为它们的种子。

没有什么比来自单个种子的一系列随机数更能均匀地分配种子了。

I assume you have some process launching the other processes. Have it pass in the seed to use. Then you can have that master process just pass in a random number for each process to use as its seed. That way there's really only one arbitrary seed chosen... you can use time for that.

If you don't have a master process launching the others, then if each process at least has a unique index, then what you can do is have one process generate a series of random numbers in memory (if shared memory) or in a file (if shared disk) and then have each process pull the index'th random number out to use as their seed.

Nothing will give you a more even distribution of seeds than a series of random numbers from a single seed.

暮光沉寂 2024-12-14 15:35:17

PID 和时间的组合应该足以获得唯一的种子。它不是 100% 跨平台,但是 getpid(3) 在 *nix 平台和 GetProcessId 可以完成 99.9% 的任务。像这样的东西应该可以工作:

srand((time(NULL) & 0xFFFF) | (getpid() << 16));

您还可以在 *nix 系统上从 /dev/urandom 读取数据,但在 Windows 上没有等效的方法。

A combination of the PID and the time should be enough to get a unique seed. It's not 100% cross-platform, but getpid(3) on *nix platforms and GetProcessId on Windows will get you 99.9% of the way there. Something like this should work:

srand((time(NULL) & 0xFFFF) | (getpid() << 16));

You could also read data from /dev/urandom on *nix systems, but there's no equivalent to that on Windows.

枯叶蝶 2024-12-14 15:35:17
unsigned seed;

read(open("/dev/urandom", O_RDONLY), &seed, sizeof seed);
srand(seed); // IRL, check for errors, close the fd, etc...

我还推荐一个更好的随机数生成器。

unsigned seed;

read(open("/dev/urandom", O_RDONLY), &seed, sizeof seed);
srand(seed); // IRL, check for errors, close the fd, etc...

I would also recommend a better random number generator.

假扮的天使 2024-12-14 15:35:17

如果可以使用 C++11,则考虑 std::random_device。我建议您观看链接以获得全面的指南。

视频链接中提取基本消息:您不应该永远使用srand & rand,而是使用 std::random_devicestd::mt19937 ——在大多数情况下,以下内容就是您想要的:

#include <iostream>
#include <random>
int main() {
    std::random_device rd;
    std::mt19937 mt(rd());
    std::uniform_int_distribution<int> dist(0,99);
    for (int i = 0; i < 16; i++) {
        std::cout << dist(mt) << " ";
    }
    std::cout << std::endl;
}

If C++11 can be used then consider std::random_device. I would suggest you to watch link for a comprehensive guide.

Extracting the essential message from the video link : You should never use srand & rand, but instead use std::random_device and std::mt19937 -- for most cases, the following would be what you want:

#include <iostream>
#include <random>
int main() {
    std::random_device rd;
    std::mt19937 mt(rd());
    std::uniform_int_distribution<int> dist(0,99);
    for (int i = 0; i < 16; i++) {
        std::cout << dist(mt) << " ";
    }
    std::cout << std::endl;
}
梦罢 2024-12-14 15:35:17

您可以使用处理器的计数器,而不是使用 C std lib time() 函数以秒为单位测量的直接时间吗?大多数处理器都有一个自由运行的滴答计数,例如在 x86/x64 中,有 时间戳计数器

时间戳计数器是自 Pentium 以来所有 x86 处理器上都存在的 64 位寄存器。它计算自重置以来的刻度数。

(该页面还有很多方法可以在不同平台上访问此计数器 - gcc/ms Visual C/等)

请记住,时间戳计数器并非没有缺陷,它可能不会跨处理器同步(您可能不关心)适合您的应用)。省电功能可能会提高或降低处理器的时钟频率(同样,您可能并不关心)。

Instead of straight time as measured in seconds from the C std lib time() function, could you instead use the processor's counter? Most processors have a free running tick count, for example in x86/x64 there's the Time Stamp Counter:

The Time Stamp Counter is a 64-bit register present on all x86 processors since the Pentium. It counts the number of ticks since reset.

(That page also has many ways to access this counter on different platforms -- gcc/ms visual c/etc)

Keep in mind that the timestamp counter is not without flaws, it may not be synced across processors (you probably don't care for your application). And power saving features may clock up or down the processor (again you probably don't care).

一念一轮回 2024-12-14 15:35:17

只是一个想法...生成一个 GUID(16 字节)并将其 4 字节或 8 字节块相加(取决于种子的预期宽度),从而允许整数环绕。使用结果作为种子。

GUID 通常封装生成它们的计算机的特征(例如 MAC 地址),这使得两台不同的计算机最终生成相同的随机序列的可能性相当小。

这显然不可移植,但为您的系统找到合适的 API/库应该不会太难(例如 Win32 上的 UuidCreate,Linux 上的 uuid_generate)。

Just an idea... generate a GUID (which is 16 bytes) and sum its 4-byte or 8-byte chunks (depending on the expected width of the seed), allowing integer wrap-around. Use the result as a seed.

GUIDs typically encapsulate characteristics of the computer that generated them (such as MAC address), which should make it rather improbable that two different machines will end-up generating the same random sequence.

This is obviously not portable, but finding appropriate APIs/libraries for your system should not be too hard (e.g. UuidCreate on Win32, uuid_generateon Linux).

迟月 2024-12-14 15:35:17

Windows

提供了CryptGenRandom()RtlGenRandom()。它们将为您提供一个随机字节数组,您可以将其用作种子。

您可以在 msdn 页面

Linux / Unixes

您可以使用 Openssl 的 RAND_bytes() 在 Linux 上获取随机字节数。默认情况下它将使用/dev/random

放在一起:

#ifdef _WIN32
  #include <NTSecAPI.h>
#else
  #include <openssl/rand.h> 
#endif

uint32_t get_seed(void)
{
  uint32_t seed = 0;

#ifdef _WIN32
  RtlGenRandom(&seed, sizeof(uint32_t) );
#else
  RAND_bytes(&seed, sizeof(uint32_t) ); 
#endif

  return seed;
}

请注意,openssl 默认提供加密安全的 PRNG,因此您可以直接使用它。更多信息此处

Windows

Provides CryptGenRandom() and RtlGenRandom(). They will give you an array of random bytes, which you can use as seeds.

You can find the docs on the msdn pages.

Linux / Unixes

You can use Openssl's RAND_bytes() to get a random number of bytes on linux. It will use /dev/random by default.

Putting it together:

#ifdef _WIN32
  #include <NTSecAPI.h>
#else
  #include <openssl/rand.h> 
#endif

uint32_t get_seed(void)
{
  uint32_t seed = 0;

#ifdef _WIN32
  RtlGenRandom(&seed, sizeof(uint32_t) );
#else
  RAND_bytes(&seed, sizeof(uint32_t) ); 
#endif

  return seed;
}

Note that openssl provides a Cryptographically secure PRNG by default, so you could use it directly. More info here.

仅冇旳回忆 2024-12-14 15:35:17

假设您使用的是相当 POSIX 风格的系统,您应该有 clock_gettime。这将以纳秒为单位给出当前时间,这意味着出于所有实际目的,不可能两次获得相同的值。 (理论上,糟糕的实现可能具有低得多的分辨率,例如,只需将毫秒乘以 100 万,但即使像 Linux 这样的半像样的系统也会给出真正的纳秒结果。)

Assuming you're on a reasonably POSIX-ish system, you should have clock_gettime. This will give the current time in nanoseconds, which means for all practical purposes it's impossible to ever get the same value twice. (In theory bad implementations could have much lower resolution, e.g. just multiplying milliseconds by 1 million, but even half-decent systems like Linux give real nanosecond results.)

幸福不弃 2024-12-14 15:35:17

如果唯一性很重要,则需要安排每个节点了解其他节点已声明哪些 ID。您可以通过询问“有人认领 ID x 吗?”的协议来做到这一点。或者预先为每个节点安排一个未分配给其他节点的ID选择。

(GUID 使用机器的 MAC,因此属于“提前安排”类别。)

如果没有某种形式的协议,您将面临两个节点获取相同 ID 的风险。

If uniqueness is important, you need to arrange for each node to know what IDs have been claimed by others. You could do this with a protocol asking "anyone claimed ID x?" or arranging in advance for each node to have a selection of IDs which have not been allocated to others.

(GUIDs use the machine's MAC, so would fall into the "arrange in advance" category.)

Without some form of agreement, you'll risk two nodes climing the same ID.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文