C# random parallel-processing thread-local-storage

使用已知种子创建 ThreadLocal 随机生成器

发布于 2024-12-21 03:52:10 字数 1611 浏览 3 评论 0原文

我正在努力寻找一种方法，让每个线程都有一个随机数生成器，同时确保重新运行程序时，生成相同的数字。

我现在所做的事情是这样的：

class Program {
    static void Main(string[] args) {

        var seed = 10;
        var data = new List<double>();
        var dataGenerator = new Random(seed);

        for (int i = 0; i < 10000; i++) {
            data.Add(dataGenerator.NextDouble());
        }

        var results = new ConcurrentBag<double>();

        Parallel.ForEach(data, (d) => {
            var result = Calculate(d, new Random(d.GetHashCode()); 
            results.Add(result);
        });

    }

    static double Calculate(double x, Random random) {
        return x * random.NextDouble();
    }
}

因为创建“数据”列表的随机生成器提供了一个种子，并且根据正在处理的数字的哈希码为计算中使用的随机生成器提供了一个种子，所以结果是可重复的。无论线程的数量以及它们实例化的顺序如何。

我想知道是否可以为每个线程实例化一个随机生成器。下面的代码似乎可以实现这一点，但由于随机生成器不再提供（可重现的）种子，因此结果不可重复。

class Program {
    static void Main(string[] args) {

        var seed = 10;
        var data = new List<double>();
        var dataGenerator = new Random(seed);

        for (int i = 0; i < 10000; i++) {
            data.Add(dataGenerator.NextDouble());
        }

        var results = new ConcurrentBag<double>();

        var localRandom = new ThreadLocal<Random>(() => new Random());

        Parallel.ForEach(data, (d) => {
            var result = Calculate(d, localRandom.Value); 
            results.Add(result);
        });

    }

    static double Calculate(double x, Random random) {
        return x * random.NextDouble();
    }
}

有人能想出一个很好的解决方案来解决这个问题吗？

原文

I'm struggling to find a way to have a single random number generator per thread, while at the same time making sure that when the program is re-run, the same numbers are produced.

What I do now is something like this:

class Program {
    static void Main(string[] args) {

        var seed = 10;
        var data = new List<double>();
        var dataGenerator = new Random(seed);

        for (int i = 0; i < 10000; i++) {
            data.Add(dataGenerator.NextDouble());
        }

        var results = new ConcurrentBag<double>();

        Parallel.ForEach(data, (d) => {
            var result = Calculate(d, new Random(d.GetHashCode()); 
            results.Add(result);
        });

    }

    static double Calculate(double x, Random random) {
        return x * random.NextDouble();
    }
}

Because the randomgenerator that creates the 'data' list is provided a seed and the randomgenerators that are used in the calculation are provided a seed based on the hashcode of the number being processed, the results are repeatable. Regardless the number of threads and the order in which they are instantiated.

I'm wondering if it's possible to instantiate just a single randomgenerator for each thread. The following following piece of code seems to accomplish that, but because the random generators are not provided with a (reproducible) seed anymore, the results are not repeatable.

class Program {
    static void Main(string[] args) {

        var seed = 10;
        var data = new List<double>();
        var dataGenerator = new Random(seed);

        for (int i = 0; i < 10000; i++) {
            data.Add(dataGenerator.NextDouble());
        }

        var results = new ConcurrentBag<double>();

        var localRandom = new ThreadLocal<Random>(() => new Random());

        Parallel.ForEach(data, (d) => {
            var result = Calculate(d, localRandom.Value); 
            results.Add(result);
        });

    }

    static double Calculate(double x, Random random) {
        return x * random.NextDouble();
    }
}

Can anyone think of a nice solution to this problem?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

久伴你 2024-12-28 03:52:10

有可能，确实您在问题中几乎做得正确，但问题是这并不完全是您想要的。

如果每次都使用相同的数字为线程本地Random播种，则可以使结果在该线程内具有确定性，与先前操作的数量相关。您想要的是一个相对于输入具有确定性的伪随机数。

好吧，你可以坚持使用Random()。没那么重。

或者，您可以拥有自己的伪随机算法。这是一个基于重新哈希算法的简单示例（旨在更好地分配哈希码位）：

private static double Calculate(double x)
{
  unchecked
  {
    uint h = (uint)x.GetHashCode();
    h += (h << 15) ^ 0xffffcd7d;
    h ^= (h >> 10);
    h += (h << 3);
    h ^= (h >> 6);
    h += (h << 2) + (h << 14);
    return (h ^ (h >> 16)) / (double)uint.MaxValue * x;
  }
}

这不是一个特别好的伪随机生成器，但它非常快。它也不进行分配并且不会导致垃圾收集。

这就是整个方法的权衡；你可以简化上面的过程，甚至更快，但“随机”更少，或者你可以更“随机”，付出更多努力。我确信有比上面的代码更快、更“随机”的代码，这比其他算法更能演示该方法，但在竞争对手的算法中，您正在考虑质量的权衡生成的数量与性能。 new Random(d).NextDouble() 处于该权衡的特定点，其他方法处于其他点。

编辑：我使用的重新哈希算法是 Wang/Jenkins 哈希。我写的时候不记得名字了。

编辑：从评论中更好地了解您的要求，我现在想说...

您想创建一个 PRNG 类，它可以使用上面的算法，即 System.Random 的算法（以反射代码为起点），您提到的 128bitXorShift 算法或其他算法。重要的区别是它必须有一个Reseed 方法。例如，如果您复制了 System.Random 的方法，那么您的重新种子看起来就像构造函数主体的大部分（事实上，您可能会进行重构，以便除了创建它使用的数组之外，构造函数将调用 reseed）。

然后，您将为每个线程创建一个实例，并在现有代码中创建新的 Random 时调用 .Reseed(d.GetHashCode()) 。

另请注意，这给您带来了另一个优势，即如果您依赖于 PRNG 的一致结果（看起来您确实如此），那么事实上，您在 System.Random 中没有得到一致的算法承诺> 框架版本之间（甚至可能包括补丁和安全修复程序）对您来说是一个坏点，而这种方法增加了一致性。

但是，我们也没有向您承诺使用与 double.GetHashCode() 一致的算法。我怀疑他们会更改它（与经常更改的 string.GetHashCode() 不同），但以防万一您可以让您的 Reseed() 采取double 执行类似以下操作：

private static unsafe int GetSeedInteger(double d)
{
  if(d == 0.0)
    return 0;
  long num = *((long*)&d);
  return ((int)num) ^ (int)(num >> 32);
}

这几乎只是复制当前的 double.GetHashCode() ，但现在您在面对框架更改时将保持一致。

可能值得考虑自己将一组任务分成多个块，为每个块创建线程，然后在每个块方法中将此对象创建为本地对象。

优点：

访问 ThreadLocal比访问本地 T 更昂贵。

如果任务在执行的相对时间上是一致的，则不需要 Parallel.ForEach 的很多聪明才智。

缺点：

Parallel.ForEach 非常擅长平衡事物。在避免使用它会给你带来任何好处之前，你所做的事情必须非常自然地平衡，或者在预块的基础上节省很多。

It's possible, indeed you very nearly do it correctly in your question, but the problem is that that isn't quite what you want.

If you seeded your thread-local Random with the same number each time, you would make the results deterministic within that thread, related to the number of previous operations. What you want is a pseudo-random number that is deterministic relative to the input.

Well, you could just stick with Random(). It's not that heavy.

Alternatively, you can have your own pseudo-random algorithm. Here's a simple example based on a re-hashing algorithm (intended to distribute the bits of hashcodes even better):

private static double Calculate(double x)
{
  unchecked
  {
    uint h = (uint)x.GetHashCode();
    h += (h << 15) ^ 0xffffcd7d;
    h ^= (h >> 10);
    h += (h << 3);
    h ^= (h >> 6);
    h += (h << 2) + (h << 14);
    return (h ^ (h >> 16)) / (double)uint.MaxValue * x;
  }
}

This isn't a particularly good pseudo-random generator, but it's pretty fast. It also does no allocation and leads to no garbage collection.

There-in lies the trade-off of this entire approach; you can simplify the above and be even faster but less "random" or you can be more "random" for more effort. I'm sure there's code out there that is both faster and more "random" than the above, which is more to demonstrate the approach than anything else, but among the rival algorithms you're looking at a trade-off of the quality of the generated number versus the performance. new Random(d).NextDouble() is at a particular point in that trade-off, other approaches are at other points.

Edit: The re-hashing algorithm I used is a Wang/Jenkins hash. I couldn't remember the name when I wrote it.

Edit: Having a better idea of your requirements from the comments, I'd now say that...

You want to create a PRNG class, it could use the algorithm above, that of System.Random (taking reflected code as a starting point), the 128bitXorShift algorithm you mention or whatever. The important difference is that it must have a Reseed method. For example, if you copied System.Random's approach, your reseed would look like most of the constructor's body (indeed, you'd probably refactor so that apart from maybe creating the array it uses, the constructor would call into reseed).

Then you'd create an instance per thread, and call .Reseed(d.GetHashCode()) at the point where you'd create a new Random in your existing code.

Note also that this gives you another advantage, which is that if you depend upon consistent results from your PRNG (which it seems you do), then the fact that you are not promised a consistent algorithm in System.Random between framework versions (perhaps even including patches and security fixes) is a bad point for you, and this approach adds consistency.

However, you are also not promised a consistent algorithm to double.GetHashCode(). I'd doubt they'd change it (unlike string.GetHashCode(), which is often changed), but just in case you could make your Reseed() take a double do something like:

private static unsafe int GetSeedInteger(double d)
{
  if(d == 0.0)
    return 0;
  long num = *((long*)&d);
  return ((int)num) ^ (int)(num >> 32);
}

Which pretty much just copies the current double.GetHashCode(), but now you'll be consistent in the face of framework changes.

It might be worth considering breaking the set of tasks into chunks yourself, creating threads for each chunk, and then just creating this object as a local in the per-chunk method.

Pros:

Accessing ThreadLocal<T> is more expensive than accessing a local T.

If the tasks are consistent in relative time to execute, you don't need a lot of Parallel.ForEach's cleverness.

Cons:

Parallel.ForEach is really good at balancing things out. What you're doing has to be very naturally balanced, or saving a lot on a pre-chunk basis, before eschewing its use gains you anything.

回复收藏 0 原文

~没有更多了~