多线程效率

发布于 2024-10-07 09:46:08 字数 405 浏览 3 评论 0原文

假设我有这样的代码，

for(i = 0; i < i_max; i++)
  for(j = 0; j < j_max; j++)
     // do something

并且我想通过使用不同的线程来完成此操作（假设 //do some 任务彼此独立，例如考虑蒙特卡洛模拟）。我的问题是：为 i 的每个值创建一个线程是否一定比为 j 的每个值创建一个线程更好？另外还有这样的事情

for(i = 0; i < i_max; i++)
  create_thread(j_max);

：合适的线程数是多少？我应该创建 i_max 线程，还是使用 k < 的信号量？ i_max 线程在任何给定时间同时运行。

谢谢你，

原文

suppose I have a code like this

for(i = 0; i < i_max; i++)
  for(j = 0; j < j_max; j++)
     // do something

and I want to do this by using different threads (assuming the //do something tasks are independent from each other, think about montecarlo simulations for instance). My question is this: is it necessarily better to create a thread for each value of i, than creating a thread for each value of j? Something like this

for(i = 0; i < i_max; i++)
  create_thread(j_max);

additionally: what would a suitable number of threads? Shall I just create i_max threads or, perhaps, use a semaphore with k < i_max threads running concurrently at any given time.

Thank you,

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寂寞陪衬 2024-10-14 09:46:08

分配工作量的最佳方式取决于工作量。

广泛而言 - 对于可并行工作负载，请使用 OpenMP；对于异构工作负载，请使用线程池。如果可以的话，避免管理自己的线程。

蒙特卡洛模拟应该是真正并行代码而不是线程池的良好候选者。

顺便说一句 - 如果您使用的是 Visual C++，Visual C++ v10 中有一个有趣的新

回复收藏 0 原文

落叶缤纷 2024-10-14 09:46:08

避免创建线程，除非你能让它们忙碌！

如果您的场景受计算限制，那么您应该将生成的线程数最小化到您期望代码运行的核心数。如果您创建的线程多于内核数量，则操作系统必须浪费时间和资源来调度线程在可用内核上执行。

如果您的场景是 IO 绑定的，那么您应该考虑使用排队的异步 IO 操作，并在异步结果返回后检查响应代码。同样，在这种情况下，为每个 IO 操作生成一个线程是非常浪费的，因为您将导致操作系统不得不浪费时间来调度停滞的线程。

回复收藏 0 原文

西瑶 2024-10-14 09:46:08

这里的每个人基本上都是对的，但是这里有一个快速而肮脏的方法来分割工作并使所有处理器保持忙碌。当 1) 与迭代中完成的工作相比，创建线程成本较高时，此方法效果最佳。 2) 大多数迭代需要大约相同的时间才能完成。

首先，为每个处理器/核心创建 1 个线程。这些是您的工作线程。他们无所事事，直到被告知去做某事。

现在，分割您的工作，使同时需要的数据紧密结合在一起。我的意思是，如果您在双处理器计算机上处理一个十元素数组，您会将其拆分，以便一组是元素 1,2,3,4,5，另一组是 6,7 ,8,9,10。您可能想将其拆分为 1,3,5,7,9 和 2,4,6,8,10，但是这样您将导致更多错误共享 (http://en.wikipedia.org/ wiki/False_sharing）在您的缓存中。

现在，每个处理器都有一个线程，每个线程都有一组数据，您只需告诉每个线程处理一组独立的数据即可。

所以在你的情况下我会做这样的事情。

for (int t=0;t<n_processors;++t)
{
  thread[t]=create_thread();
  datamin[t]=t*(i_max/n_processors);
  datamax[t]=(t+1)*(i_max/n_processors);
}

for (int t=0;t<n_processors;++t)
  do_work(thread[t], datamin[t], datamax[t], j_max)

//wait for all threads to be done

//continue with rest of the program.

当然，我遗漏了诸如处理数据不是处理器数量的整数倍之类的事情，但这些很容易修复。

另外，如果您不反对第 3 方库，英特尔的 TBB（线程构建块）可以很好地从您那里抽象出来，让您开始真正想做的工作。

Everyone here is basically right, but here's a quick-and-dirty way to split up the work and keep all of the processors busy. This works best when 1) creating threads is expensive compared to the work done in an iteration 2) most iterations take about the same amount of time to complete

First, create 1 thread per processor/core. These are your worker threads. They sit idle until they're told to do something.

Now, split up your work such that work that data that is needed at the same time is close together. What I mean by that is that if you were processing a ten-element array on a two processor machine, you'd split it up so that one group is elements 1,2,3,4,5 and the other is 6,7,8,9,10. You may be tempted to split it up 1,3,5,7,9 and 2,4,6,8,10, but then you're going to cause more false sharing (http://en.wikipedia.org/wiki/False_sharing) in your cache.

So now that you have a thread per processor and a group of data for each thread, you just tell each thread to work on an independent group of that data.

So in your case I'd do something like this.

for (int t=0;t<n_processors;++t)
{
  thread[t]=create_thread();
  datamin[t]=t*(i_max/n_processors);
  datamax[t]=(t+1)*(i_max/n_processors);
}

for (int t=0;t<n_processors;++t)
  do_work(thread[t], datamin[t], datamax[t], j_max)

//wait for all threads to be done

//continue with rest of the program.

Of course I left out things like dealing with your data not being an integer multiple of the number of processors, but those are easily fixed.

Also, if you're not adverse to 3rd party libraries, Intel's TBB (threading building blocks) does a great job of abstracting this from you and letting you get to the real work you want to do.

回复收藏 0 原文