多核处理器编程
据我所知,处理器中的多核架构不会影响程序。实际的指令执行是在较低层处理的。
我的问题是,
鉴于您拥有多核环境,我可以使用任何编程实践来更有效地利用可用资源吗?我应该如何更改代码才能在多核环境中获得更高的性能?
As far as I know, the multi-core architecture in a processor does not effect the program. The actual instruction execution is handled in a lower layer.
my question is,
Given that you have a multicore environment, Can I use any programming practices to utilize the available resources more effectively? How should I change my code to gain more performance in multicore environments?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是正确的。除非您使用并发,否则您的程序不会运行得更快(除了该核心处理的其他进程较少这一事实,因为某些进程正在其他核心上运行)。但是,如果您确实使用并发性,则更多的核心可以提高实际的并行性(核心数较少,并发性是交错的,而核心数较多,则可以在线程之间获得真正的并行性)。
让程序高效地并发并不是一件简单的任务。如果做得不好,让你的程序并发实际上会让它变慢!例如,如果您花费大量时间生成线程(线程构造非常慢),并且在非常小的块大小上进行工作(因此线程构造的开销主导了实际工作),或者如果您经常同步数据(这不仅会强制操作串行运行,而且还会产生非常高的开销),或者如果您经常在多个线程之间写入同一缓存行中的数据(这可能会导致整个缓存行在一个线程上失效)的核心),那么并发编程可能会严重损害性能。
还需要注意的是,如果您有 N 个核心,并不意味着您将获得 N 的加速比。这是加速比的理论极限。事实上,也许使用两个核心时速度会快两倍,但是使用四个核心时可能会快三倍左右,然后使用八个核心时大约会快三倍半,依此类推。您的程序实际上有多好能够利用这些核心的优势称为并行可扩展性。通常,通信和同步开销会阻止线性加速,但在理想情况下,如果您可以尽可能避免通信和同步,则有望接近线性加速。
在 StackOverflow 上不可能给出如何编写高效并行程序的完整答案。这实际上是至少一门(可能是几门)计算机科学课程的主题。我建议你报名参加这样的课程或购买一本书。如果我知道一本好书,我会向您推荐一本书,但是我参加的并行算法课程没有该课程的教科书。您可能还有兴趣使用串行实现、多线程并行实现(常规线程、线程池等)以及消息传递并行实现(例如 Hadoop、Apache Spark、Cloud Dataflows)来编写一些程序。 、异步 RPC 等),然后测量它们的性能,在并行实现的情况下改变内核数量。这是我的并行算法课程的大部分课程,并且非常有洞察力。您可能会尝试并行化的一些计算包括使用蒙特卡罗方法计算 Pi(这是可简单并行化的,假设您可以创建一个随机数生成器,其中不同线程中生成的随机数是独立的)、执行矩阵乘法、计算一个矩阵,对一些非常大的 N 的数字 1...N 的平方求和,我相信你可以想到其他的。
That is correct. Your program will not run any faster (except for the fact that the core is handling fewer other processes, because some of the processes are being run on the other core) unless you employ concurrency. If you do use concurrency, though, more cores improves the actual parallelism (with fewer cores, the concurrency is interleaved, whereas with more cores, you can get true parallelism between threads).
Making programs efficiently concurrent is no simple task. If done poorly, making your program concurrent can actually make it slower! For example, if you spend lots of time spawning threads (thread construction is really slow), and do work on a very small chunk size (so that the overhead of thread construction dominates the actual work), or if you frequently synchronize your data (which not only forces operations to run serially, but also has a very high overhead on top of it), or if you frequently write to data in the same cache line between multiple threads (which can lead to the entire cache line being invalidated on one of the cores), then you can seriously harm the performance with concurrent programming.
It is also important to note that if you have N cores, that DOES NOT mean that you will get a speedup of N. That is the theoretical limit to the speedup. In fact, maybe with two cores it is twice as fast, but with four cores it might be about three times as fast, and then with eight cores it is about three and a half times as fast, etc. How well your program is actually able to take advantage of these cores is called the parallel scalability. Often communication and synchronization overhead prevent a linear speedup, although, in the ideal, if you can avoid communication and synchronization as much as possible, you can hopefully get close to linear.
It would not be possible to give a complete answer on how to write efficient parallel programs on StackOverflow. This is really the subject of at least one (probably several) computer science courses. I suggest that you sign up for such a course or buy a book. I'd recommend a book to you if I knew of a good one, but the paralell algorithms course I took did not have a textbook for the course. You might also be interested in writing a handful of programs using a serial implementation, a parallel implementation with multithreading (regular threads, thread pools, etc.), and a parallel implementation with message passing (such as with Hadoop, Apache Spark, Cloud Dataflows, asynchronous RPCs, etc.), and then measuring their performance, varying the number of cores in the case of the parallel implementations. This was the bulk of the course work for my parallel algorithms course and can be quite insightful. Some computations you might try parallelizing include computing Pi using the Monte Carlo method (this is trivially parallelizable, assuming you can create a random number generator where the random numbers generated in different threads are independent), performing matrix multiplication, computing the row echelon form of a matrix, summing the square of the number 1...N for some very large number of N, and I'm sure you can think of others.
我不知道这是否是最好的起点,但我已经订阅了 英特尔软件网络 前一段时间,在那里发现了很多有趣的东西,并以非常简单的方式呈现。您可以找到一些有关并行计算基本概念的非常基础的文章,例如 这个。 在这里您可以快速了解 openMP是一种可能的方法,可以开始并行化应用程序中最慢的部分,而不改变其余部分。 (当然,如果这些部分呈现并行性。)另请检查 英特尔多线程应用程序开发指南。或者直接浏览文章部分,文章并太多,因此您可以快速找出最适合您的。他们还有一个名为 Parallel 的论坛和每周网络广播编程讲座。
I don't know if it's the best possible place to start, but I've subscribed to the article feed from Intel Software Network some time ago and have found a lot of interesting thing there, presented in pretty simple way. You can find some very basic articles on fundamental concepts of parallel computing, like this. Here you have a quick dive into openMP that is one possible approach to start parallelizing the slowest parts of your application, without changing the rest. (If those parts present parallelism, of course.) Also check Intel Guide for Developing Multithreaded Applications. Or just go and browse the article section, the articles are not too many, so you can quickly figure out what suits you best. They also have a forum and a weekly webcast called Parallel Programming Talk.
是的,简单地向系统添加更多内核而不更改软件不会产生任何结果(操作系统能够在单独的内核上调度多个并发进程除外)。
要让操作系统利用多核,您需要执行以下两件事之一:增加每个进程的线程数,或增加同时运行的进程数(或两者!)。
然而,有效利用核心却是另一回事。如果您花费太多时间在线程/进程之间同步共享数据访问,那么您的并发级别将因线程相互等待而受到影响。这还假设您有一个可以相对容易并行化的问题/计算,因为算法的并行版本通常比其顺序版本复杂得多。
也就是说,特别是对于具有彼此独立的工作单元的 CPU 限制计算,当您在问题上投入更多线程时,您很可能会看到线性加速。当您添加串行段和同步块时,这种加速往往会降低。
I/O 繁重的计算通常在多线程环境中表现最差,因为对物理存储的访问(尤其是在同一控制器或同一介质上)也是串行的,在这种情况下,线程在多线程环境中变得更有用。感觉它释放了其他线程以继续用户交互或基于 CPU 的操作。
Yes, simply adding more cores to a system without altering the software would yield you no results (with exception of the operating system would be able to schedule multiple concurrent processes on separate cores).
To have your operating system utilise your multiple cores, you need to do one of two things: increase the thread count per process, or increase the number of processes running at the same time (or both!).
Utilising the cores effectively, however, is a beast of a different colour. If you spend too much time synchronising shared data access between threads/processes, your level of concurrency will take a hit as threads wait on each other. This also assumes that you have a problem/computation that can relatively easily be parallelised, since the parallel version of an algorithm is often much more complex than the sequential version thereof.
That said, especially for CPU-bound computations with work units that are independent of each other, you'll most likely see a linear speed-up as you throw more threads at the problem. As you add serial segments and synchronisation blocks, this speed-up will tend to decrease.
I/O heavy computations would typically fare the worst in a multi-threaded environment, since access to the physical storage (especially if it's on the same controller, or the same media) is also serial, in which case threading becomes more useful in the sense that it frees up your other threads to continue with user interaction or CPU-based operations.
您可以考虑使用专为并发编程而设计的编程语言。我想到了 Erlang 和 Go。
You might consider using programming languages designed for concurrent programming. Erlang and Go come to mind.