使用Parallel.Foreach（多个线程）从列表中获取总和到值的元素数组

发布于 2025-01-23 04:24:01 字数 1017 浏览 4 评论 0 原文

我正在使用以下代码从列表中获取总和到值的元素数组。但是，它仅使用1个CPU。我的PC有64个核心，因此我想使用100％的CPU加快流程，因此您可以帮助我更新代码以使用 Parallel.Foreach.foreach （多个线程）吗？

using System;
using System.Collections.Generic;
using System.Linq;

IEnumerable<List<int>> subset_sum(IEnumerable<int> numbers, int target,
    IEnumerable<int> partial = null, int partial_sum = 0)
{
    partial ??= Enumerable.Empty<int>();
    if (partial_sum == target) yield return partial.ToList();
    if (partial_sum >= target) yield break;
    foreach (var (n, i) in numbers.Select((n, i) => (n, i)))
    {
        var remaining = numbers.Skip(i + 1);
        foreach (var subset in subset_sum(
            remaining, target, partial.Append(n), partial_sum + n))
                yield return subset;
    }
}

var numbers = new List<int> { 3, 9, 8, 4, 5, 7, 10 };
var target = 15;
foreach (var subset in subset_sum(numbers, target))
    Console.WriteLine($"[{string.Join(", ", subset)}] = {subset.Sum()}");

原文

I'm using the below code to get an array of elements from list that sum to value. However, it only using 1 CPU. My PC has 64 cores so I want to use 100% CPU to speed up the process, so could you please help me update the code to use Parallel.ForEach (multiple threads)?

using System;
using System.Collections.Generic;
using System.Linq;

IEnumerable<List<int>> subset_sum(IEnumerable<int> numbers, int target,
    IEnumerable<int> partial = null, int partial_sum = 0)
{
    partial ??= Enumerable.Empty<int>();
    if (partial_sum == target) yield return partial.ToList();
    if (partial_sum >= target) yield break;
    foreach (var (n, i) in numbers.Select((n, i) => (n, i)))
    {
        var remaining = numbers.Skip(i + 1);
        foreach (var subset in subset_sum(
            remaining, target, partial.Append(n), partial_sum + n))
                yield return subset;
    }
}

var numbers = new List<int> { 3, 9, 8, 4, 5, 7, 10 };
var target = 15;
foreach (var subset in subset_sum(numbers, target))
    Console.WriteLine(quot;[{string.Join(", ", subset)}] = {subset.Sum()}");

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

英雄似剑 2025-01-30 04:24:01

这篇文章不是关于如何并行化子集找到算法的答案。它只是表明可以更快的单线程实现。以下实现不是递归的。它使用 stack＆lt; int 作为查找源序列的所有组合的机制（待处理 stack）。它的速度比我的PC中的递归实现快7-8倍。

IEnumerable<IList<int>> FindSubsets(IEnumerable<int> source, int target)
{
    int[] sourceArray = source.ToArray();
    if (target == 0) yield return Array.Empty<int>();

    List<int> subset = new(sourceArray.Length); // Indices
    int subsetSum = 0;
    Stack<int> pending = new(sourceArray.Length); // Indices
    pending.Push(0);

    while (pending.TryPop(out var index))
    {
        while (subset.Count > pending.Count)
        {
            subsetSum -= sourceArray[subset[subset.Count - 1]];
            subset.RemoveAt(subset.Count - 1);
        }
        for (; index < sourceArray.Length; index++)
        {
            subset.Add(index);
            subsetSum += sourceArray[index];
            if (subsetSum == target)
                yield return subset.Select(i => sourceArray[i]).ToArray();
            if (index + 1 < sourceArray.Length) pending.Push(index + 1);
        }
    }
}

This post is not an answer on how to parallelize the subset-finding algorithm. It just shows that a faster single-thread implementation is possible. The implementation below is not recursive. It uses a Stack<int> as a mechanism for finding all the combinations of the source sequence (the pending stack). It is about 7-8 times faster than the recursive implementation in my PC.

IEnumerable<IList<int>> FindSubsets(IEnumerable<int> source, int target)
{
    int[] sourceArray = source.ToArray();
    if (target == 0) yield return Array.Empty<int>();

    List<int> subset = new(sourceArray.Length); // Indices
    int subsetSum = 0;
    Stack<int> pending = new(sourceArray.Length); // Indices
    pending.Push(0);

    while (pending.TryPop(out var index))
    {
        while (subset.Count > pending.Count)
        {
            subsetSum -= sourceArray[subset[subset.Count - 1]];
            subset.RemoveAt(subset.Count - 1);
        }
        for (; index < sourceArray.Length; index++)
        {
            subset.Add(index);
            subsetSum += sourceArray[index];
            if (subsetSum == target)
                yield return subset.Select(i => sourceArray[i]).ToArray();
            if (index + 1 < sourceArray.Length) pending.Push(index + 1);
        }
    }
}

回复收藏 0 原文

蛮可爱 2025-01-30 04:24:01

并行化递归算法并不容易。当工作的功能可以递归创造更多的工作时，如何划分工作量并不明显。在这种特殊情况下，可以通过将数字分为两个部分，并将两个部分之一作为分区进行分配。然后，对于每个分区，在另一部分中搜索组合的总和等于 target 减去分区的总和。以下是该想法的实现。 plinq 并行化：

static IEnumerable<IList<int>> FindSubsets(IEnumerable<int> source, int target)
{
    const int leftSize = 8; // Results in 256 partitions
    int[] sourceArray = source.ToArray();
    int[] left = sourceArray.Take(leftSize).ToArray();
    int[] right = sourceArray.TakeLast(sourceArray.Length - left.Length).ToArray();
    return Partitioner
        .Create(Combinations(left), EnumerablePartitionerOptions.NoBuffering)
        .AsParallel()
        .WithDegreeOfParallelism(Environment.ProcessorCount)
        .WithMergeOptions(ParallelMergeOptions.NotBuffered)
        .SelectMany(leftPart =>
        {
            int leftPartSum = leftPart.Sum();
            return FindSubsets_Internal(right, target - leftPartSum).
                Select(rightPart => leftPart.Concat(rightPart).ToArray());
        });

    static IEnumerable<int[]> Combinations(IEnumerable<int> remaining,
        IEnumerable<int> partial = null)
    {
        partial ??= Enumerable.Empty<int>();
        yield return partial.ToArray();
        foreach (var (n, i) in remaining.Select((n, i) => (n, i)))
        {
            var innerRemaining = remaining.Skip(i + 1);
            var inner = Combinations(innerRemaining, partial.Append(n));
            foreach (var subset in inner) yield return subset;
        }
    }

    static IEnumerable<int[]> FindSubsets_Internal(IEnumerable<int> remaining,
        int target, IEnumerable<int> partial = null, int partialSum = 0)
    {
        partial ??= Enumerable.Empty<int>();
        if (partialSum == target) yield return partial.ToArray();
        if (partialSum >= target) yield break;
        foreach (var (n, i) in remaining.Select((n, i) => (n, i)))
        {
            var innerRemaining = remaining.Skip(i + 1);
            var inner = FindSubsets_Internal(
                innerRemaining, target, partial.Append(n), partialSum + n);
            foreach (var subset in inner) yield return subset;
        }
    }
}

此实现并不比原始的单线程方法快得多，因为递归算法相当涉及内存且效率低下。比较执行前后，记忆分配是巨大的：我的PC中每秒约700 MB。因此，而不是瓶颈是CPU，而是内存总线的带宽。 CPU内核主要在等待数据存储或从主内存中获取，而不是计算。至少这是我的理论。要进行比较，我在我的其他答案，产生合理的性能改进（我的四核PC上约为2.4倍）。

Parallelizing recursive algorithms is not easy. It's not obvious how to partition the workload, when the function that does the work can create more work recursively. In this particular case it is possible to partition the work by splitting the numbers into two parts, and using one of the two parts as the partitions. Then for each partition search for combinations in the other part that have a sum equal to the target minus the sum of the partition. Below is an implementation of this idea. The PLINQ library is used for the parallelization:

static IEnumerable<IList<int>> FindSubsets(IEnumerable<int> source, int target)
{
    const int leftSize = 8; // Results in 256 partitions
    int[] sourceArray = source.ToArray();
    int[] left = sourceArray.Take(leftSize).ToArray();
    int[] right = sourceArray.TakeLast(sourceArray.Length - left.Length).ToArray();
    return Partitioner
        .Create(Combinations(left), EnumerablePartitionerOptions.NoBuffering)
        .AsParallel()
        .WithDegreeOfParallelism(Environment.ProcessorCount)
        .WithMergeOptions(ParallelMergeOptions.NotBuffered)
        .SelectMany(leftPart =>
        {
            int leftPartSum = leftPart.Sum();
            return FindSubsets_Internal(right, target - leftPartSum).
                Select(rightPart => leftPart.Concat(rightPart).ToArray());
        });

    static IEnumerable<int[]> Combinations(IEnumerable<int> remaining,
        IEnumerable<int> partial = null)
    {
        partial ??= Enumerable.Empty<int>();
        yield return partial.ToArray();
        foreach (var (n, i) in remaining.Select((n, i) => (n, i)))
        {
            var innerRemaining = remaining.Skip(i + 1);
            var inner = Combinations(innerRemaining, partial.Append(n));
            foreach (var subset in inner) yield return subset;
        }
    }

    static IEnumerable<int[]> FindSubsets_Internal(IEnumerable<int> remaining,
        int target, IEnumerable<int> partial = null, int partialSum = 0)
    {
        partial ??= Enumerable.Empty<int>();
        if (partialSum == target) yield return partial.ToArray();
        if (partialSum >= target) yield break;
        foreach (var (n, i) in remaining.Select((n, i) => (n, i)))
        {
            var innerRemaining = remaining.Skip(i + 1);
            var inner = FindSubsets_Internal(
                innerRemaining, target, partial.Append(n), partialSum + n);
            foreach (var subset in inner) yield return subset;
        }
    }
}

This implementation is not much faster than the original single-thread method, because the recursive algorithm is quite memory-hungry and inefficient. Comparing the value of the GC.GetTotalAllocatedBytes(true) before and after the execution reveals that the memory allocation is massive: around 700 MB per second in my PC. So instead of the bottleneck being the CPU, it's the bandwidth of the memory bus. The CPU cores are mostly waiting for data to be stored or be fetched from the main memory, instead of calculating. At least this is my theory. For comparison parallelizing the non-recursive algorithm that I've posted in my other answer, yields reasonable performance improvements (around 2.4x on my quad core PC).