并行生成分区
我正在使用一种算法(用 C 实现)来生成集合的分区。 (代码在这里:http://www.martinbroadhurst.com/combinatorial-algorithms.html #分区)。
我想知道是否有办法修改这个算法以并行而不是线性运行?
我的 CPU 上有多个核心,并且希望将分区的生成分成多个运行线程。
I am using an algorithm (implemented in C) that generates partitions of a set. (The code is here: http://www.martinbroadhurst.com/combinatorial-algorithms.html#partitions).
I was wondering if there is a way to modify this algorithm to run in parallel instead of linearly?
I've got multiple cores on my CPU and would like split up the generation of partitions into multiple running threads.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
初始化一个包含前 k 个元素的每个分区的共享集合。每个线程,直到集合为空,重复从集合中删除一个分区,并使用您链接到的算法生成剩余 n - k 元素的所有可能性(当增加当前 n 元素分区时,获取另一个 k 元素前缀将改变前 k 个元素中的一个)。
Initialize a shared collection containing every partition of the first k elements. Each thread, until the collection is empty, repeatedly removes a partition from the collection and generates all possibilities for the remaining n - k elements using the algorithm you linked to (get another k-element prefix when incrementing the current n-element partition would change the one of the first k elements).
正如您所看到的,您提到的算法在基数
n
中创建计数器,并且每次将具有相同编号的项目放在一组中,并以这种方式对输入进行分区。每个计数器从
0 计数到 (0,1,2,...,n-1)
,这意味着A=
n-1+(n-2)*n+ ...+1*nn-1+0 个数字。因此,您可以在 k 个不同的线程上运行算法,在第一个线程中,您应该从 0 计数到 A/k,在第二个线程中,您应该从 (A/k)+1 计数到 2*A/k,依此类推。意味着您应该添加一个long
变量并用上限(在 for 循环条件中)检查它,同时计算的 A 值和基本
。n
格式的相关数字r*A/k 为 0 <= r <= kAs you can see your referred algorithms creates counter in base
n
and each time put items with same number in one group, and in such a way partitions input.Each counter counts from
0 to (0,1,2,...,n-1)
which meansA=
n-1+(n-2)*n+...+1*nn-1+0 numbers. So you can run your algorithm on k different thread, in first thread you should count from 0 to A/k, in second you should count from (A/k)+1 to 2*A/k and so on. means just you should add along
variable and check it with upper bound (in your for loop conditions) Also calculating A value and related number in basen
format forr*A/k for 0 <= r <= k
.首先,考虑串行算法的以下变体。获取元素
a
,并将其分配给子集#0(这始终有效,因为分区内子集的顺序并不重要)。下一个元素b
可能属于与a
相同的子集,也可能属于不同的子集,即属于子集#1。然后,元素c
属于 #0(与a
一起)或 #1(如果与分开,则与
),或者它自己的子集(如果#0={b
一起) aa
,b
},则为#1;如果#0={,则为#2a
} 和#1={b
})。等等。因此,您将新元素一一添加到部分构建的分区中,为每个输入生成一些可能的输出 - 直到您放入所有元素。并行化的关键是每个不完整分区可以独立地附加新元素,即与所有其他变体并行。该算法可以以不同的方式实现。我将使用一种递归方法,其中给一个函数一个部分填充的数组及其当前长度,根据下一个元素的可能值(比数组当前的最后一个值多一个)复制该数组多次),将下一个元素设置为每个可能的值,并为每个新数组递归地调用自身,并增加长度。这种方法似乎特别适合窃取工作的并行引擎,例如 cilk 或tbb。类似于 @swen 建议的实现也是可能的:您使用所有不完整分区的集合和线程池,每个线程从集合中获取一个分区,生成所有可能的扩展并将它们放回集合中;添加了所有元素的分区显然应该进入不同的集合。
First, consider the following variation of the serial algorithm. Take the element
a
, and assign it to the subset #0 (this is always valid, because the order of subsets inside a partition does not matter). The next elementb
might belong either to the same subset asa
or to a different one, i.e. to subset #1. Then, the elementc
belongs to either #0 (together witha
) or #1 (together withb
if it's separate froma
), or to its own subset (which will be #1 if #0={a
,b
}, or #2 if #0={a
} and #1={b
}). And so on. So you add new elements one by one to partially built partitions, producing a few possible outputs for each input - until you put all the elements. The key to parallelization is that each incomplete partition can be appended with new elements independently, i.e. in parallel with, all other variants.The algorithm can be implemented in different ways. I would use a recursive approach, in which a function is given a partially filled array and its current length, copies the array as many times as there are possible values for the next element (which is one more than the current last value of the array), sets the next element to every possible value and calls itself recursively for each new array, with increased length. This approach seems particularly good for work-stealing parallel engines, such as cilk or tbb. An implementation similar to suggested by @swen is also possible: you use a collection of all incomplete partitions and a pool of threads, and each thread takes one partition from the collection, produces all possible extensions and put those back to the collection; partitions with all elements added should obviously go into a different collection.
这是我使用 swen 的建议获得的 C++ 实现。线程的数量取决于 r 的值。对于 r=6,分区数是第六个铃数,等于 203。对于 r=0,我们只是得到一个正常的非并行程序。
Here is the c++ implementation I obtained using swen's suggestion. The number of threads depends on the value of r. For r=6 the number of partitions is the sixth bell number, which is equal to 203. For r=0 we just get a normal non-parallel program.