在 C# 中使用多线程加速循环(问题)

发布于 2024-07-06 17:15:39 字数 352 浏览 10 评论 0原文

想象一下,我有一个函数可以遍历一百万/十亿个字符串并检查其中的内容。

f.ex:

foreach (String item in ListOfStrings)
{
    result.add(CalculateSmth(item));
}

它消耗大量时间,因为CalculateSmth是非常耗时的函数。

我想问:如何在这种过程中集成多线程?

f.ex:我想启动 5 个线程,每个线程都会返回一些结果,这样一直持续下去,直到列表中有项目为止。

也许任何人都可以展示一些示例或文章..

忘记提及我在 .NET 2.0 中需要它

Imagine I have an function which goes through one million/billion strings and checks smth in them.

f.ex:

foreach (String item in ListOfStrings)
{
    result.add(CalculateSmth(item));
}

it consumes lot's of time, because CalculateSmth is very time consuming function.

I want to ask: how to integrate multithreading in this kinda process?

f.ex: I want to fire-up 5 threads and each of them returns some results, and thats goes-on till the list has items.

Maybe anyone can show some examples or articles..

Forgot to mention I need it in .NET 2.0

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

╰つ倒转 2024-07-13 17:15:39

您可以尝试并行扩展(.NET 4.0的一部分)

这些允许你可以这样写:

Parallel.Foreach (ListOfStrings, (item) => 
    result.add(CalculateSmth(item));
);

当然 result.add 需要是线程安全的。

You could try the Parallel extensions (part of .NET 4.0)

These allow you to write something like:

Parallel.Foreach (ListOfStrings, (item) => 
    result.add(CalculateSmth(item));
);

Of course result.add would need to be thread safe.

白馒头 2024-07-13 17:15:39

并行扩展很酷,但这也可以通过使用线程池来完成,如下所示:

using System.Collections.Generic;
using System.Threading;

namespace noocyte.Threading
{
    class CalcState
    {
        public CalcState(ManualResetEvent reset, string input) {
            Reset = reset;
            Input = input;
        }
        public ManualResetEvent Reset { get; private set; }
        public string Input { get; set; }
    }

    class CalculateMT
    {
        List<string> result = new List<string>();
        List<ManualResetEvent> events = new List<ManualResetEvent>();

        private void Calc() {
            List<string> aList = new List<string>();
            aList.Add("test");

            foreach (var item in aList)
            {
                CalcState cs = new CalcState(new ManualResetEvent(false), item);
                events.Add(cs.Reset);
                ThreadPool.QueueUserWorkItem(new WaitCallback(Calculate), cs);
            }
            WaitHandle.WaitAll(events.ToArray());
        }

        private void Calculate(object s)
        {
            CalcState cs = s as CalcState;
            cs.Reset.Set();
            result.Add(cs.Input);
        }
    }
}

The Parallel extensions is cool, but this can also be done just by using the threadpool like this:

using System.Collections.Generic;
using System.Threading;

namespace noocyte.Threading
{
    class CalcState
    {
        public CalcState(ManualResetEvent reset, string input) {
            Reset = reset;
            Input = input;
        }
        public ManualResetEvent Reset { get; private set; }
        public string Input { get; set; }
    }

    class CalculateMT
    {
        List<string> result = new List<string>();
        List<ManualResetEvent> events = new List<ManualResetEvent>();

        private void Calc() {
            List<string> aList = new List<string>();
            aList.Add("test");

            foreach (var item in aList)
            {
                CalcState cs = new CalcState(new ManualResetEvent(false), item);
                events.Add(cs.Reset);
                ThreadPool.QueueUserWorkItem(new WaitCallback(Calculate), cs);
            }
            WaitHandle.WaitAll(events.ToArray());
        }

        private void Calculate(object s)
        {
            CalcState cs = s as CalcState;
            cs.Reset.Set();
            result.Add(cs.Input);
        }
    }
}
赏烟花じ飞满天 2024-07-13 17:15:39

请注意,并发并不会神奇地为您提供更多资源。 您需要确定是什么导致CalculateSmth 变慢。

例如,如果它受 CPU 限制(并且您使用的是单核),则无论您是顺序执行还是并行执行,代码都会获得相同数量的 CPU 时钟周期。 另外,您还会从管理线程中获得一些开销。 相同的参数适用于其他约束(例如 I/O),

如果CalculateSmth 在其执行期间保持资源空闲(可以由另一个实例使用),您将仅获得性能提升。 这并不罕见。 例如,如果任务涉及 IO,然后是一些 CPU 内容,则进程 1 可能正在执行 CPU 内容,而进程 2 正在执行 IO。 正如马茨指出的,如果有基础设施,生产者-消费者单元链可以实现这一目标。

Note that concurrency doesn't magically give you more resource. You need to establish what is slowing CalculateSmth down.

For example, if it's CPU-bound (and you're on a single core) then the same number of CPU ticks will go to the code, whether you execute them sequentially or in parallel. Plus you'd get some overhead from managing the threads. Same argument applies to other constraints (e.g. I/O)

You'll only get performance gains in this if CalculateSmth is leaving resource free during its execution, that could be used by another instance. That's not uncommon. For example, if the task involves IO followed by some CPU stuff, then process 1 could be doing the CPU stuff while process 2 is doing the IO. As mats points out, a chain of producer-consumer units can achieve this, if you have the infrastructure.

我为君王 2024-07-13 17:15:39

您需要并行地分解您想要完成的工作。 以下是如何将工作一分为二的示例:

List<string> work = (some list with lots of strings)

// Split the work in two
List<string> odd = new List<string>();
List<string> even = new List<string>();
for (int i = 0; i < work.Count; i++)
{
    if (i % 2 == 0)
    {
        even.Add(work[i]);
    }
    else
    {
        odd.Add(work[i]);
    }
}

// Set up to worker delegates
List<Foo> oddResult = new List<Foo>();
Action oddWork = delegate { foreach (string item in odd) oddResult.Add(CalculateSmth(item)); };

List<Foo> evenResult = new List<Foo>();
Action evenWork = delegate { foreach (string item in even) evenResult.Add(CalculateSmth(item)); };

// Run two delegates asynchronously
IAsyncResult evenHandle = evenWork.BeginInvoke(null, null);
IAsyncResult oddHandle = oddWork.BeginInvoke(null, null);

// Wait for both to finish
evenWork.EndInvoke(evenHandle);
oddWork.EndInvoke(oddHandle);

// Merge the results from the two jobs
List<Foo> allResults = new List<Foo>();
allResults.AddRange(oddResult);
allResults.AddRange(evenResult);

return allResults;

You need to split up the work you want to do in parallel. Here is an example of how you can split the work in two:

List<string> work = (some list with lots of strings)

// Split the work in two
List<string> odd = new List<string>();
List<string> even = new List<string>();
for (int i = 0; i < work.Count; i++)
{
    if (i % 2 == 0)
    {
        even.Add(work[i]);
    }
    else
    {
        odd.Add(work[i]);
    }
}

// Set up to worker delegates
List<Foo> oddResult = new List<Foo>();
Action oddWork = delegate { foreach (string item in odd) oddResult.Add(CalculateSmth(item)); };

List<Foo> evenResult = new List<Foo>();
Action evenWork = delegate { foreach (string item in even) evenResult.Add(CalculateSmth(item)); };

// Run two delegates asynchronously
IAsyncResult evenHandle = evenWork.BeginInvoke(null, null);
IAsyncResult oddHandle = oddWork.BeginInvoke(null, null);

// Wait for both to finish
evenWork.EndInvoke(evenHandle);
oddWork.EndInvoke(oddHandle);

// Merge the results from the two jobs
List<Foo> allResults = new List<Foo>();
allResults.AddRange(oddResult);
allResults.AddRange(evenResult);

return allResults;
半边脸i 2024-07-13 17:15:39

你必须回答的第一个问题是你是否应该使用线程

如果你的函数CalculateSmth()基本上是CPU限制的,即CPU使用率很高并且基本上没有I/O使用率,那么我很难理解这一点使用线程,因为线程将竞争相同的资源,在本例中是 CPU。

如果您的CalculateSmth()同时使用CPU和I/O,那么使用线程可能是一个要点。

我完全同意对我的答案的评论。 我错误地假设我们谈论的是单核 CPU,但现在我们有了多核 CPU,这是我的错。

The first question you must answer is whether you should be using threading

If your function CalculateSmth() is basically CPU-bound, i.e. heavy in CPU-usage and basically no I/O-usage, then I have a hard time seeing the point of using threads, since the threads will be competing over the same resource, in this case the CPU.

If your CalculateSmth() is using both CPU and I/O, then it might be a point in using threading.

I totally agree with the comment to my answer. I made a erroneous assumption that we were talking about a single CPU with one core, but these days we have multi-core CPUs, my bad.

你的他你的她 2024-07-13 17:15:39

并不是说我现在这里有任何好的文章,但您想要做的是使用线程池进行生产者-消费者的一些操作。

生产者循环并创建任务(在本例中可能只是将列表或堆栈中的项目排队)。 比如说,消费者是五个线程,它们从堆栈中读取一项,通过计算来消费它,然后将其存储在其他地方。

这样,多线程就仅限于这五个线程,并且它们都将有工作要做,直到堆栈为空。

需要考虑的事情:

  • 对输入和输出列表进行保护,例如互斥锁。
  • 如果顺序很重要,请确保保持输出顺序。 一个例子是将它们存储在 SortedList 或类似的东西中。
  • 确保CalculateSmth是线程安全的,它不使用任何全局状态。

Not that I have any good articles here right now, but what you want to do is something along Producer-Consumer with a Threadpool.

The Producers loops through and creates tasks (which in this case could be to just queue up the items in a List or Stack). The Consumers are, say, five threads that reads one item off the stack, consumes it by calculating it, and then stores it else where.

This way the multithreading is limited to just those five threads, and they will all have work to do up until the stack is empty.

Things to think about:

  • Put protection on the input and output list, such as a mutex.
  • If the order is important, make sure that the output order is maintained. One example could be to store them in a SortedList or something like that.
  • Make sure that the CalculateSmth is thread safe, that it doesn't use any global state.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文