以分布式方式枚举组合

发布于 2024-10-12 10:44:39 字数 718 浏览 2 评论 0原文

我遇到一个问题,我必须分析某物的 500C5 组合 (255244687600)。将其分布在 10 个节点的集群上,其中每个集群每秒处理大约 10^6 个组合,这意味着该作业将在大约 7 小时内完成。

我遇到的问题是将 255244687600 个组合分布在 10 个节点上。我想为每个节点提供 25524468760,但是我使用的算法只能按顺序生成组合,我希望能够传递元素集和一系列组合索引,例如,[0 -10^7)、[10^7,2.0 10^7) 等,并让节点自己计算出组合。

我目前使用的算法来自以下内容:

我考虑过使用主节点,它枚举每个组合并将工作发送到每个节点。然而,从单个节点迭代组合和来回通信工作所产生的开销是巨大的,并且随后将导致主节点成为瓶颈。

是否有任何适合高效/最佳分布式枚举的良好组合迭代算法?

I have a problem where I must analyse 500C5 combinations (255244687600) of something. Distributing it over a 10-node cluster where each cluster processes roughly 10^6 combinations per second means the job will be complete in about seven hours.

The problem I have is distributing the 255244687600 combinations over the 10 nodes. I'd like to present each node with 25524468760, however the algorithms I'm using can only produce the combinations sequentially, I'd like to be able to pass the set of elements and a range of combination indicies, for example, [0-10^7), [10^7,2.0 10^7), etc. and have the nodes themselves figure out the combinations.

The algorithms I'm using at the moment are from the following:

I've considered using a master node, that enumerates each of the combinations and sends work to each of the nodes. However, the overhead incurred in iterating the combinations from a single node and communicating back and forth work is enormous, and it will subsequently lead to the master node becoming the bottleneck.

Is there any good combination iterating algorithms geared up for efficient/optimal distributed enumeration?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

许久 2024-10-19 10:44:39

您可能会在组合数字方面取得一些成功,它允许您检索N'th (n/10th) k - 与简单算法的组合;然后在十个节点中的每一个上运行 next_combination 算法 n/10 次以进行迭代。

示例代码(C# 语言,但对于 C++ 程序员来说非常可读)可以在 MSDN

You may have some success with combinatorial numbers, which allow you to retrieve the N'th (n/10th) k-combination with a simple algorithm; then run the next_combination algorithm n/10 times on each of the ten nodes to iterate.

Sample code (in C#, but quite readable for a C++ programmer) can be found on MSDN.

青衫负雪 2024-10-19 10:44:39

从第 n 个开始,让编号为 n 的节点处理每十个组合。

Have node number n process every tenth combination, starting from the nth.

尬尬 2024-10-19 10:44:39

我知道这个问题很老了,但这里是如何有效地完成它的方法。

目前 Python 中的所有代码我确信都可以很容易地转换为 C++。
- 您可能希望从使用整数作为特征向量改为使用位数组,因为使用的整数需要 500位(在 Python 中不是问题)
- 任何人都可以随意更新到 C++。


  1. 向节点分发其组合范围(要处理的起始编号和长度)、要从中选择组合的项目集合,以及要选择的项目数 k
  2. 通过直接从 startitems 找到其起始组合来初始化每个节点。
  3. 通过让每个节点执行第一个组合的工作来运行每个节点,然后迭代其余的组合,并执行关联的工作。

要执行1,请按照您的建议查找n-choose-k并将其划分为范围 - 在您的情况下,500-Choose-5正如您所说,是255244687600,因此,对于node=0到9,您分布:
(start=node*25524468760, length=25524468760, items=items, k=5)

要执行2,您可以直接找到起始组合(无需迭代),同时使用组合数系统求出组合特征向量的整数表示(用于3中的迭代):

def getCombination(index, items, k):
    '''Returns (combination, characteristicVector)
    combination - The single combination, of k elements of items, that would be at
    the provided index if all possible combinations had each been sorted in
    descending order (as defined by the order of items) and then placed in a
    sorted list.
    characteristicVector - an integer with chosen item's bits set.
    '''
    combination = []
    characteristicVector = 0
    n = len(items)
    nCk = 1
    for nMinusI, iPlus1 in zip(range(n, n - k, -1), range(1, k + 1)):
        nCk *= nMinusI
        nCk //= iPlus1
    curIndex = nCk
    for k in range(k, 0, -1):
        nCk *= k
        nCk //= n
        while curIndex - nCk > index:
            curIndex -= nCk
            nCk *= (n - k)
            nCk -= nCk % k
            n -= 1
            nCk //= n
        n -= 1
        combination .append(items[n])
        characteristicVector += 1 << n
    return combination, characteristicVector 

特征向量的整数表示有k位集合在组成组合的项目的位置中。

要执行 3,您可以使用 Gosper 的 hack 迭代到同一数字系统中组合的下一个特征向量(下一个组合将出现在相对于 的反向排序组合的排序列表中) items),同时创建组合:

def nextCombination(items, characteristicVector):
    '''Returns the next (combination, characteristicVector).
    combination - The next combination of items that would appear after the
    combination defined by the provided characteristic vector if all possible
    combinations had each been sorted in descending order (as defined by the order
    of items) and then placed in a sorted list.
    characteristicVector - an integer with chosen item's bits set.
    '''
    u = characteristicVector & -characteristicVector
    v = u + characteristicVector
    if v <= 0:
        raise OverflowError("Ran out of integers") # <- ready for C++
    characteristicVector =  v + (((v ^ characteristicVector) // u) >> 2)
    combination = []
    copiedVector = characteristicVector
    index = len(items) - 1
    while copiedVector > 0:
        present, copiedVector = divmod(copiedVector, 1 << index)
        if present:
            combination.append(items[index])
        index -= 1
    return combination, characteristicVector

重复此 length-1 次(因为您已经直接找到了第一个)。


例如:

五个节点处理 7-choose-3 个字母:

>>> items = ('A','B','C','D','E','F','G')
>>> k = 3
>>> nodes = 5
>>> n =  len(items)
>>> for nmip1, i in zip(range(n - 1, n - k, -1), range(2, k + 1)):
...     n = n * nmip1 // i
...
>>> for node in range(nodes):
...     length = n // nodes
...     start = node * length
...     print("Node {0} initialised".format(node))
...     combination, cv = getCombination(start, items, k)
...     doWork(combination)
...     for i in range(length-1):
...             combination, cv = nextCombination(items, cv)
...             doWork(combination)
...
Node 0 initialised
Doing work with: C B A
Doing work with: D B A
Doing work with: D C A
Doing work with: D C B
Doing work with: E B A
Doing work with: E C A
Doing work with: E C B
Node 1 initialised
Doing work with: E D A
Doing work with: E D B
Doing work with: E D C
Doing work with: F B A
Doing work with: F C A
Doing work with: F C B
Doing work with: F D A
Node 2 initialised
Doing work with: F D B
Doing work with: F D C
Doing work with: F E A
Doing work with: F E B
Doing work with: F E C
Doing work with: F E D
Doing work with: G B A
Node 3 initialised
Doing work with: G C A
Doing work with: G C B
Doing work with: G D A
Doing work with: G D B
Doing work with: G D C
Doing work with: G E A
Doing work with: G E B
Node 4 initialised
Doing work with: G E C
Doing work with: G E D
Doing work with: G F A
Doing work with: G F B
Doing work with: G F C
Doing work with: G F D
Doing work with: G F E
>>>

I know this question is old, but here is how it may be done efficiently.

All code currently in Python which I'm sure will be easy enough to translate to C++.
- You will probably want to move from using an integer for the characteristic vector to using a bit array, since the integers used will need 500 bits (not a problem in Python)
- Feel free to update to C++ anyone.


  1. Distribute to the nodes their range of combinations (start number and length to process), the set of items from which combinations are to be chosen, and the number, k, of items to choose.
  2. Initialise each node by having it find its starting combination directly from start and items.
  3. Run each node by having it do the work for the first combination then iterate through the rest of its combinations, and do the associated work.

To perform 1 do as you suggest find n-choose-k and divide it into ranges - in your case 500-Choose-5 is, as you said, 255244687600 so, for node=0 to 9 you distribute:
(start=node*25524468760, length=25524468760, items=items, k=5)

To perform 2 you can find the starting combination directly (without iteration) using the combinatorial number system and find the integer representation of the combination's characteristic vector (to be used for the iteration in 3) at the same time:

def getCombination(index, items, k):
    '''Returns (combination, characteristicVector)
    combination - The single combination, of k elements of items, that would be at
    the provided index if all possible combinations had each been sorted in
    descending order (as defined by the order of items) and then placed in a
    sorted list.
    characteristicVector - an integer with chosen item's bits set.
    '''
    combination = []
    characteristicVector = 0
    n = len(items)
    nCk = 1
    for nMinusI, iPlus1 in zip(range(n, n - k, -1), range(1, k + 1)):
        nCk *= nMinusI
        nCk //= iPlus1
    curIndex = nCk
    for k in range(k, 0, -1):
        nCk *= k
        nCk //= n
        while curIndex - nCk > index:
            curIndex -= nCk
            nCk *= (n - k)
            nCk -= nCk % k
            n -= 1
            nCk //= n
        n -= 1
        combination .append(items[n])
        characteristicVector += 1 << n
    return combination, characteristicVector 

The integer representation of the characteristic vector has k bits set in the positions of the items that make up the combination.

To perform 3 you can use Gosper's hack to iterate to the next characteristic vector for the combination in the same number system (the next combination that would appear in a sorted list of reverse sorted combinations relative to items) and, at the same time, create the combination:

def nextCombination(items, characteristicVector):
    '''Returns the next (combination, characteristicVector).
    combination - The next combination of items that would appear after the
    combination defined by the provided characteristic vector if all possible
    combinations had each been sorted in descending order (as defined by the order
    of items) and then placed in a sorted list.
    characteristicVector - an integer with chosen item's bits set.
    '''
    u = characteristicVector & -characteristicVector
    v = u + characteristicVector
    if v <= 0:
        raise OverflowError("Ran out of integers") # <- ready for C++
    characteristicVector =  v + (((v ^ characteristicVector) // u) >> 2)
    combination = []
    copiedVector = characteristicVector
    index = len(items) - 1
    while copiedVector > 0:
        present, copiedVector = divmod(copiedVector, 1 << index)
        if present:
            combination.append(items[index])
        index -= 1
    return combination, characteristicVector

Repeat this length-1 times (since you already found the first one directly).


For example:

Five nodes processing 7-choose-3 letters:

>>> items = ('A','B','C','D','E','F','G')
>>> k = 3
>>> nodes = 5
>>> n =  len(items)
>>> for nmip1, i in zip(range(n - 1, n - k, -1), range(2, k + 1)):
...     n = n * nmip1 // i
...
>>> for node in range(nodes):
...     length = n // nodes
...     start = node * length
...     print("Node {0} initialised".format(node))
...     combination, cv = getCombination(start, items, k)
...     doWork(combination)
...     for i in range(length-1):
...             combination, cv = nextCombination(items, cv)
...             doWork(combination)
...
Node 0 initialised
Doing work with: C B A
Doing work with: D B A
Doing work with: D C A
Doing work with: D C B
Doing work with: E B A
Doing work with: E C A
Doing work with: E C B
Node 1 initialised
Doing work with: E D A
Doing work with: E D B
Doing work with: E D C
Doing work with: F B A
Doing work with: F C A
Doing work with: F C B
Doing work with: F D A
Node 2 initialised
Doing work with: F D B
Doing work with: F D C
Doing work with: F E A
Doing work with: F E B
Doing work with: F E C
Doing work with: F E D
Doing work with: G B A
Node 3 initialised
Doing work with: G C A
Doing work with: G C B
Doing work with: G D A
Doing work with: G D B
Doing work with: G D C
Doing work with: G E A
Doing work with: G E B
Node 4 initialised
Doing work with: G E C
Doing work with: G E D
Doing work with: G F A
Doing work with: G F B
Doing work with: G F C
Doing work with: G F D
Doing work with: G F E
>>>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文