关于多线程、锁和多核处理器的多部分问题(multi^3)
我有一个有两种方法的程序。第一个方法采用两个数组作为参数,并执行一个操作,其中一个数组中的值有条件地写入另一个数组,如下所示:
void Blend(int[] dest, int[] src, int offset)
{
for (int i = 0; i < src.Length; i++)
{
int rdr = dest[i + offset];
dest[i + offset] = src[i] > rdr? src[i] : rdr;
}
}
第二个方法创建两组独立的 int
数组并迭代它们这样一组中的每个数组都与另一组中的每个数组混合,如下所示:
void CrossBlend()
{
int[][] set1 = new int[150][75000]; // we'll pretend this actually compiles
int[][] set2 = new int[25][10000]; // we'll pretend this actually compiles
for (int i1 = 0; i1 < set1.Length; i1++)
{
for (int i2 = 0; i2 < set2.Length; i2++)
{
Blend(set1[i1], set2[i2], 0); // or any offset, doesn't matter
}
}
}
第一个问题:由于这种方法显然是并行化的候选者,所以它是吗?本质上线程安全?似乎不是,因为我可以设想一种场景(我认为不太可能),其中一个线程的更改由于不同的线程〜同时操作而丢失。
如果不是,这会
void Blend(int[] dest, int[] src, int offset)
{
lock (dest)
{
for (int i = 0; i < src.Length; i++)
{
int rdr = dest[i + offset];
dest[i + offset] = src[i] > rdr? src[i] : rdr;
}
}
}
是一个有效的解决方案吗?
第二个问题:如果是这样,使用这样的锁可能会产生什么性能成本?我假设,对于这样的事情,如果一个线程尝试锁定当前被另一个线程锁定的目标数组,第一个线程将阻塞,直到锁被释放,而不是继续处理某些内容。
另外,获取锁实际上需要多少时间?纳秒级或更糟?这会是这样的事情的一个主要问题吗?
第三个问题:我如何以利用多核处理器的多线程方式最好地解决这个问题(这是基于一个潜在错误的假设,即多线程解决方案不会在单核处理器上加速此操作)?我猜想我希望每个核心运行一个线程,但我不知道这是否属实。
I have a program with two methods. The first method takes two arrays as parameters, and performs an operation in which values from one array are conditionally written into the other, like so:
void Blend(int[] dest, int[] src, int offset)
{
for (int i = 0; i < src.Length; i++)
{
int rdr = dest[i + offset];
dest[i + offset] = src[i] > rdr? src[i] : rdr;
}
}
The second method creates two separate sets of int
arrays and iterates through them such that each array of one set is Blend
ed with each array from the other set, like so:
void CrossBlend()
{
int[][] set1 = new int[150][75000]; // we'll pretend this actually compiles
int[][] set2 = new int[25][10000]; // we'll pretend this actually compiles
for (int i1 = 0; i1 < set1.Length; i1++)
{
for (int i2 = 0; i2 < set2.Length; i2++)
{
Blend(set1[i1], set2[i2], 0); // or any offset, doesn't matter
}
}
}
First question: Since this apporoach is an obvious candidate for parallelization, is it intrinsically thread-safe? It seems like no, since I can conceive a scenario (unlikely, I think) where one thread's changes are lost because a different threads ~simultaneous operation.
If no, would this:
void Blend(int[] dest, int[] src, int offset)
{
lock (dest)
{
for (int i = 0; i < src.Length; i++)
{
int rdr = dest[i + offset];
dest[i + offset] = src[i] > rdr? src[i] : rdr;
}
}
}
be an effective fix?
Second question: If so, what would be the likely performance cost of using locks like this? I assume that with something like this, if a thread attempts to lock a destination array that is currently locked by another thread, the first thread would block until the lock was released instead of continuing to process something.
Also, how much time does it actually take to acquire a lock? Nanosecond scale, or worse than that? Would this be a major issue in something like this?
Third question: How would I best approach this problem in a multi-threaded way that would take advantage of multi-core processors (and this is based on the potentially wrong assumption that a multi-threaded solution would not speed up this operation on a single core processor)? I'm guessing that I would want to have one thread running per core, but I don't know if that's true.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
CrossBlend 的潜在争用是 set1 - 混合的目的地。与其使用锁(与您正在执行的工作量相比,锁的成本相对较高),不如安排每个线程在其自己的目标上工作。也就是说,给定的目标(set1 中某个索引处的数组)由给定的任务拥有。这是可能的,因为结果与 CrossBlend 处理数组的顺序无关。
每个任务应该只运行 CrossBlend 中的内部循环,并且使用要使用的目标数组 (set1) 的索引(或范围)对任务进行参数化)
您还可以并行化 Blend 方法,因为每个索引都是独立于其他索引计算的,因此不会出现争用。但在当今的机器上,拥有 <40 个内核,只需线程化 CrossBlend 方法即可获得足够的并行性。
为了在多核上有效运行,您可以
第二种方法最适合大小不规则的任务,或者系统正在用于其他任务的情况,因此某些核心可能会在其他进程之间进行时间切换,因此您不能期望在不同的任务上大致相同的时间内完成等量的工作核心。
第一种方法的编码要简单得多,并且会给您带来良好的并行性。
The potential contention with CrossBlend is set1 - the destination of the blend. Rather than using a lock, which is going to be comparatively expensive compared to the amount of work you are doing, arrange for each thread to work on it's own destination. That is a given destination (array at some index in set1) is owned by a given task. This is possible since the outcome is independent of the order that CrossBlend processes the arrays in.
Each task should then run just the inner loop in CrossBlend, and the task is parameterized with the index of the dest array (set1) to use (or range of indices.)
You can also parallelize the Blend method, since each index is computed independently of the others, so no contention there. But on todays machines, with <40 cores you will get sufficient parallism just threading the CrossBlend method.
To run effectively on multi-core you can either
The second approach is best suited to irregularly sized tasks, or where the system is being used for other tasks, so some cores may be time switching between other processes, so you cannot expect that equal amounts of work complete in the roughly same time on different cores.
The first approach is much simpler to code, and will give you a good level of parallelism.