多线程黑客
我正在编写一个数值程序,应该有点快,而且也是多线程的。我有一个代表数字的类,我想在其中使用随机数生成器。现在,我并不真的要求我的 RNG 是真正的 RNG,我只需要它生成在 0 和 NMAX 之间均匀分布的整数。
所以我在我的课堂上有这个:
// use just an int here and forget about multithreading.
static uint32 rand = NMAX/4;
// this will be called multithreadedly
static uint32 GetRand() { return rand = ( rand + 1 ) % NMAX; }
现在,在单线程世界中,这完全适合我的目的。
由于这是多线程的,我假设唯一可能发生的坏事是偶尔(例如 <1% 的时间)更新会被丢弃。这意味着两个线程读取 rand,将其更新到寄存器中,返回更新后的值,然后使用相同的值写入两次。这完全没问题。
我的问题是还会发生比这更糟糕的事情吗?我完全同意每个线程使用自己的 rand 变量,但这只是一个巨大的痛苦。我绝对不能做的是让类的每个实例都使用自己的 rand 变量,因为那样会使用太多内存。
更新:
那么,我为什么要这样做呢?完整的故事是它的浮点类使用 1 或 2 个字节。所以它必须要快,这似乎是最好的方法。事实上,我想我会将其从 ( rand + 1 ) % NMAX
更新为 ( rand + [some prime] ) % NMAX
因为它似乎工作得更好。这是其中一种情况的示例,其中更健壮的解决方案需要更多代码,使事物变得不那么通用并且更加依赖,使代码不太清晰并且更容易破坏,而所有这些都是为了“应该使用适当的同步”的想法。
主要是我担心编译器可能会做一些奇怪的优化,这样对 rand 的更新不仅会被删除,而且 rand 会变成完全垃圾。现在我想了一下,即使也没关系(这个数字的使用方式),因为下一次使用 GetRand 无论如何都会 %NMAX 它,该错误最多只会导致一次使用GetRand 超出给定范围 [0,NMAX)。感谢您的任何答复。
I am writing a numerical program that should be somewhat fast, and is also multithreaded. I have a class that represents a number, and I want to use in it a random number generator. Now, I don't really require my RNG to be a true RNG, I just need it to generate integers with even distribution between 0 and NMAX.
So I have this in my class:
// use just an int here and forget about multithreading.
static uint32 rand = NMAX/4;
// this will be called multithreadedly
static uint32 GetRand() { return rand = ( rand + 1 ) % NMAX; }
Now, in a single threaded world, this is totally fine for my purposes.
Since this is multithreaded, I'm assuming that the only possible bad thing that could happen is that occasionally (like <1% of the time) the update gets dropped. Meaning two threads read rand, update it in a register, return the updated value, and then write it, twice, with the same value. This is totally fine.
My question is could anything worse than that happen? I'm totally fine with each thread using its own rand
variable but that's just a huge pain to make happen. What I definitely can't do is make it so that each instance of the class uses its own rand variable, as that would use way too much memory.
UPDATE:
So, why do I want to do this? Full story is its a floating point class that uses 1 or 2 bytes. So it has to be fast and such, and this just seems the best way. In fact I think I'll update it from ( rand + 1 ) % NMAX
to something like ( rand + [some prime] ) % NMAX
since it seems to work better. This is an example of one of those cases where a more robust solution would require more code, make things less generic and more dependency ridden, make code less clear and easier to break, and all for the idea that "proper synchronization should be used".
Mostly I'm worried of some weird optimization that the compiler might do, so that an update to rand is not just dropped, but rand becomes total garbage. Now that I think about it though, even that would be okay (the way this number is used), since the next use of GetRand would %NMAX it anyway, the error would only cause at most one use of GetRand to be outside the given range of [0,NMAX). Thanks for any answers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这样做并不一定那么困难。某些编译器(例如 GCC)支持线程本地存储 ,它允许每个线程拥有自己的给定变量的副本。
话虽如此,除了你提到的问题之外,我只能想到你目前的做法存在一个问题。如果每个线程在不同的核心上运行,并且随机值存储在每个核心的非共享缓存中,则更新可能无法在无限期的时间内在核心之间传播。您可以通过使用内存屏障(可以通过使用锁创建)来避免这种情况,但这可能会损害性能。
It's not necessarily so difficult to do it that way. Some compilers (e.g. GCC) support thread-local storage, which allows each thread to have its own copy of a given variable.
Having said that, there's only one problem that I can think of — other than the one you mentioned — with your current way of doing it. If each thread is run on a different core, and the random value is stored in each core's non-shared cache, the updates may fail to be propagated among the cores for an indefinite amount of time. You can avoid this by using a memory barrier (which can be created via the use of locks), but this will probably be bad for performance.
为了讨论的目的,我们假设以下实现:
我的改进建议是每个实例一次获取 16 个数字。这 16 个数字不必存储(复制)在实例内部:您只需将全局索引增加 16(使它们对其他实例不可用),以便实例可以一一使用它们。
For the purpose of discussion let's assume the following implementation:
My suggestion for improvement is for each instance to fetch, say, 16 numbers at a time. The 16 numbers do not have to be stored (copied) inside the instance: you simply increment the global index by 16 (making them unavailable to other instances), so that the instance can consume them one by one.
:
:
:
:
如果两个线程同时调用 GetRand,则可能会发生经典的不同步错误。例如,rand = 10。在两个线程调用 GetRand 后,rand 预计为 12,但实际上可能是 11。如果这对您来说没问题,则可以不更改此代码。但我认为最好使用同步,因为没有同步,代码和结果看起来都有点奇怪。另一种编程可能会认为这是错误。
编辑。
最坏的情况:两个或多个线程从内存中读取相同的 rand 变量 rand。每个线程在本地进行计算 ( rand + 1 ) % NMAX。然后所有线程将相同的结果放回内存。这不会破坏 rand 变量值,也不会导致该变量超出范围,并且数字生成器将继续计算有效数字。
If two threads call GetRand at the same time, the classic unsynchronized error may happen. For example, rand = 10. After two threads call GetRand, rand is expected to be 12, but actually it may be 11. If this OK for you, you can leave this code without changes. But I think it is better to use synchronization, because without it both the code and its result look a bit strange. Another programming can think that is is bug.
Edit.
Worst case: two or more threads read the same rand variable rand from the memory. Each thread locally makes calculation ( rand + 1 ) % NMAX. Then all threads put the same result back to the memory. This doesn't corrupt the rand variable value, this doesn't cause this variable to be out of scope and number generator will continue to compute valid numbers.