以分布式方式枚举组合
我遇到一个问题,我必须分析某物的 500C5 组合 (255244687600)。将其分布在 10 个节点的集群上,其中每个集群每秒处理大约 10^6 个组合,这意味着该作业将在大约 7 小时内完成。
我遇到的问题是将 255244687600 个组合分布在 10 个节点上。我想为每个节点提供 25524468760,但是我使用的算法只能按顺序生成组合,我希望能够传递元素集和一系列组合索引,例如,[0 -10^7)、[10^7,2.0 10^7) 等,并让节点自己计算出组合。
我目前使用的算法来自以下内容:
Stack Overflow 问题高效计算向量组合
我考虑过使用主节点,它枚举每个组合并将工作发送到每个节点。然而,从单个节点迭代组合和来回通信工作所产生的开销是巨大的,并且随后将导致主节点成为瓶颈。
是否有任何适合高效/最佳分布式枚举的良好组合迭代算法?
I have a problem where I must analyse 500C5 combinations (255244687600) of something. Distributing it over a 10-node cluster where each cluster processes roughly 10^6 combinations per second means the job will be complete in about seven hours.
The problem I have is distributing the 255244687600 combinations over the 10 nodes. I'd like to present each node with 25524468760, however the algorithms I'm using can only produce the combinations sequentially, I'd like to be able to pass the set of elements and a range of combination indicies, for example, [0-10^7), [10^7,2.0 10^7), etc. and have the nodes themselves figure out the combinations.
The algorithms I'm using at the moment are from the following:
Stack Overflow question Efficiently computing vector combinations
I've considered using a master node, that enumerates each of the combinations and sends work to each of the nodes. However, the overhead incurred in iterating the combinations from a single node and communicating back and forth work is enormous, and it will subsequently lead to the master node becoming the bottleneck.
Is there any good combination iterating algorithms geared up for efficient/optimal distributed enumeration?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可能会在组合数字方面取得一些成功,它允许您检索N'th (n/10th) k - 与简单算法的组合;然后在十个节点中的每一个上运行
next_combination
算法 n/10 次以进行迭代。示例代码(C# 语言,但对于 C++ 程序员来说非常可读)可以在 MSDN。
You may have some success with combinatorial numbers, which allow you to retrieve the N'th (n/10th) k-combination with a simple algorithm; then run the
next_combination
algorithm n/10 times on each of the ten nodes to iterate.Sample code (in C#, but quite readable for a C++ programmer) can be found on MSDN.
从第 n 个开始,让编号为 n 的节点处理每十个组合。
Have node number n process every tenth combination, starting from the nth.
我知道这个问题很老了,但这里是如何有效地完成它的方法。
目前 Python 中的所有代码我确信都可以很容易地转换为 C++。
- 您可能希望从使用整数作为特征向量改为使用位数组,因为使用的整数需要 500位(在 Python 中不是问题)
- 任何人都可以随意更新到 C++。
起始
编号和长度
)、要从中选择组合的项目
集合,以及要选择的项目数k
。start
和items
找到其起始组合来初始化每个节点。要执行1,请按照您的建议查找n-choose-k并将其划分为范围 - 在您的情况下,500-Choose-5正如您所说,是255244687600,因此,对于node=0到9,您分布:
(start=node*25524468760, length=25524468760, items=items, k=5)
要执行2,您可以直接找到起始组合(无需迭代),同时使用组合数系统求出组合特征向量的整数表示(用于3中的迭代):
特征向量的整数表示有k位集合在组成组合的项目的位置中。
要执行 3,您可以使用 Gosper 的 hack 迭代到同一数字系统中组合的下一个特征向量(下一个组合将出现在相对于
的反向排序组合的排序列表中) items
),同时创建组合:重复此
length-1
次(因为您已经直接找到了第一个)。例如:
五个节点处理 7-choose-3 个字母:
I know this question is old, but here is how it may be done efficiently.
All code currently in Python which I'm sure will be easy enough to translate to C++.
- You will probably want to move from using an integer for the characteristic vector to using a bit array, since the integers used will need 500 bits (not a problem in Python)
- Feel free to update to C++ anyone.
start
number andlength
to process), the set ofitems
from which combinations are to be chosen, and the number,k
, of items to choose.start
anditems
.To perform 1 do as you suggest find n-choose-k and divide it into ranges - in your case 500-Choose-5 is, as you said, 255244687600 so, for node=0 to 9 you distribute:
(start=node*25524468760, length=25524468760, items=items, k=5)
To perform 2 you can find the starting combination directly (without iteration) using the combinatorial number system and find the integer representation of the combination's characteristic vector (to be used for the iteration in 3) at the same time:
The integer representation of the characteristic vector has k bits set in the positions of the items that make up the combination.
To perform 3 you can use Gosper's hack to iterate to the next characteristic vector for the combination in the same number system (the next combination that would appear in a sorted list of reverse sorted combinations relative to
items
) and, at the same time, create the combination:Repeat this
length-1
times (since you already found the first one directly).For example:
Five nodes processing 7-choose-3 letters: