多核处理器编程

发布于 2024-08-26 04:08:16 字数 127 浏览 10 评论 0原文

据我所知，处理器中的多核架构不会影响程序。实际的指令执行是在较低层处理的。

我的问题是，

鉴于您拥有多核环境，我可以使用任何编程实践来更有效地利用可用资源吗？我应该如何更改代码才能在多核环境中获得更高的性能？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鹿港巷口少年归 2024-09-02 04:08:16

这是正确的。除非您使用并发，否则您的程序不会运行得更快（除了该核心处理的其他进程较少这一事实，因为某些进程正在其他核心上运行）。但是，如果您确实使用并发性，则更多的核心可以提高实际的并行性（核心数较少，并发性是交错的，而核心数较多，则可以在线程之间获得真正的并行性）。

让程序高效地并发并不是一件简单的任务。如果做得不好，让你的程序并发实际上会让它变慢！例如，如果您花费大量时间生成线程（线程构造非常慢），并且在非常小的块大小上进行工作（因此线程构造的开销主导了实际工作），或者如果您经常同步数据（这不仅会强制操作串行运行，而且还会产生非常高的开销），或者如果您经常在多个线程之间写入同一缓存行中的数据（这可能会导致整个缓存行在一个线程上失效）的核心），那么并发编程可能会严重损害性能。

还需要注意的是，如果您有 N 个核心，并不意味着您将获得 N 的加速比。这是加速比的理论极限。事实上，也许使用两个核心时速度会快两倍，但是使用四个核心时可能会快三倍左右，然后使用八个核心时大约会快三倍半，依此类推。您的程序实际上有多好能够利用这些核心的优势称为并行可扩展性。通常，通信和同步开销会阻止线性加速，但在理想情况下，如果您可以尽可能避免通信和同步，则有望接近线性加速。

在 StackOverflow 上不可能给出如何编写高效并行程序的完整答案。这实际上是至少一门（可能是几门）计算机科学课程的主题。我建议你报名参加这样的课程或购买一本书。如果我知道一本好书，我会向您推荐一本书，但是我参加的并行算法课程没有该课程的教科书。您可能还有兴趣使用串行实现、多线程并行实现（常规线程、线程池等）以及消息传递并行实现（例如 Hadoop、Apache Spark、Cloud Dataflows）来编写一些程序。、异步 RPC 等），然后测量它们的性能，在并行实现的情况下改变内核数量。这是我的并行算法课程的大部分课程，并且非常有洞察力。您可能会尝试并行化的一些计算包括使用蒙特卡罗方法计算 Pi（这是可简单并行化的，假设您可以创建一个随机数生成器，其中不同线程中生成的随机数是独立的）、执行矩阵乘法、计算一个矩阵，对一些非常大的 N 的数字 1...N 的平方求和，我相信你可以想到其他的。

That is correct. Your program will not run any faster (except for the fact that the core is handling fewer other processes, because some of the processes are being run on the other core) unless you employ concurrency. If you do use concurrency, though, more cores improves the actual parallelism (with fewer cores, the concurrency is interleaved, whereas with more cores, you can get true parallelism between threads).

Making programs efficiently concurrent is no simple task. If done poorly, making your program concurrent can actually make it slower! For example, if you spend lots of time spawning threads (thread construction is really slow), and do work on a very small chunk size (so that the overhead of thread construction dominates the actual work), or if you frequently synchronize your data (which not only forces operations to run serially, but also has a very high overhead on top of it), or if you frequently write to data in the same cache line between multiple threads (which can lead to the entire cache line being invalidated on one of the cores), then you can seriously harm the performance with concurrent programming.

It is also important to note that if you have N cores, that DOES NOT mean that you will get a speedup of N. That is the theoretical limit to the speedup. In fact, maybe with two cores it is twice as fast, but with four cores it might be about three times as fast, and then with eight cores it is about three and a half times as fast, etc. How well your program is actually able to take advantage of these cores is called the parallel scalability. Often communication and synchronization overhead prevent a linear speedup, although, in the ideal, if you can avoid communication and synchronization as much as possible, you can hopefully get close to linear.

It would not be possible to give a complete answer on how to write efficient parallel programs on StackOverflow. This is really the subject of at least one (probably several) computer science courses. I suggest that you sign up for such a course or buy a book. I'd recommend a book to you if I knew of a good one, but the paralell algorithms course I took did not have a textbook for the course. You might also be interested in writing a handful of programs using a serial implementation, a parallel implementation with multithreading (regular threads, thread pools, etc.), and a parallel implementation with message passing (such as with Hadoop, Apache Spark, Cloud Dataflows, asynchronous RPCs, etc.), and then measuring their performance, varying the number of cores in the case of the parallel implementations. This was the bulk of the course work for my parallel algorithms course and can be quite insightful. Some computations you might try parallelizing include computing Pi using the Monte Carlo method (this is trivially parallelizable, assuming you can create a random number generator where the random numbers generated in different threads are independent), performing matrix multiplication, computing the row echelon form of a matrix, summing the square of the number 1...N for some very large number of N, and I'm sure you can think of others.

回复收藏 0 原文

万劫不复 2024-09-02 04:08:16

我不知道这是否是最好的起点，但我已经订阅了英特尔软件网络前一段时间，在那里发现了很多有趣的东西，并以非常简单的方式呈现。您可以找到一些有关并行计算基本概念的非常基础的文章，例如这个。在这里您可以快速了解 openMP是一种可能的方法，可以开始并行化应用程序中最慢的部分，而不改变其余部分。（当然，如果这些部分呈现并行性。）另请检查英特尔多线程应用程序开发指南。或者直接浏览文章部分，文章并太多，因此您可以快速找出最适合您的。他们还有一个名为 Parallel 的论坛和每周网络广播编程讲座。