当前位置：文江博客话题详情

您如何利用多核？

发布于 2024-07-10 09:40:19 字数 1338 浏览 8 评论 0原文

作为来自企业 Web 开发领域的 HPC 领域的人，我'我总是好奇“现实世界”中的开发人员如何利用并行计算。现在，所有芯片都将转向多核，这一点就更加重要了，而且它会当芯片上有数千个内核而不是几个内核时，这一点就更有意义。

我的问题是：

这对您的软件路线图有何影响？
我对有关多核如何影响不同软件领域的真实故事特别感兴趣，因此请在答案中指定您进行的开发类型（例如服务器端、客户端应用程序、科学计算等）。
您如何利用现有代码来利用多核机器？您遇到了哪些挑战？您是否使用 OpenMP、Erlang、Haskell、CUDA，TBB，UPC 还是其他？
随着并发水平不断增加，您打算做什么？您将如何处理数百或数千个核心？
如果您的领域不容易从并行计算中受益，那么解释原因也很有趣。

最后，我将其视为一个多核问题，但请随意讨论其他类型的并行计算。如果您要移植应用的一部分以使用 MapReduce，或者MPI 是适合您的范例，那么也一定要提及这一点。

更新：如果您确实回答#5，请提及如果内核数量（100、1000 等）多于可用内存带宽所能提供的数量，您认为情况是否会发生变化（参见带宽如何每个核心变得越来越小）。您仍然可以为您的应用程序使用剩余的内核吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

酒几许 2024-07-17 09:40:19

我的研究工作包括编译器和垃圾邮件过滤方面的工作。我还做了很多 Unix 方面的“个人生产力”工作。另外，我编写并使用软件来管理我所教授的课程，其中包括评分、测试学生代码、跟踪成绩以及无数其他琐事。

多核对我没有任何影响，除了作为编译器支持其他应用程序的研究问题。但这些问题主要在于运行时系统，而不是编译器。
戴夫·沃特曼 (Dave Wortman) 在 1990 年左右克服了巨大的困难和代价，证明了可以并行化编译器以保持四个处理器忙碌。我认识的人中没有人重复过这个实验。 大多数编译器的速度足以运行单线程。在几个不同的源文件上并行运行顺序编译器比使编译器本身并行要容易得多。对于垃圾邮件过滤，学习本质上是一个连续的过程。即使是较旧的机器每秒也可以学习数百条消息，因此即使是大型语料库也可以在一分钟内学习。再次强调，训练速度足够快。
我利用并行机的唯一重要方法是使用并行make。这是一个巨大的福音，并且大型构建很容易并行化。 Make 几乎自动完成所有工作。我唯一记得的另一件事是使用并行性来计时长时间运行的学生代码，将其分配给一堆实验室机器，我可以凭良心这样做，因为我只破坏每台机器的一个核心，所以只使用 1 /4 的 CPU 资源。哦，我写了一个 Lua 脚本，在使用 lame 翻录 MP3 文件时将使用所有 4 个内核。这个剧本需要做很多工作才能得到正确的结果。
我将忽略数十个、数百个和数千个核心。我第一次被告知“并行机即将到来；你必须做好准备”是在 1984 年。并行编程是高技能专家的领域，无论是当时还是今天都是如此。唯一改变的是，如今制造商强迫我们为并行硬件付费，无论我们是否愿意。但是仅仅因为硬件是付费的并不意味着它可以免费使用。编程模型很糟糕，并且使线程/互斥模型工作，更不用说表现良好了即使硬件是免费的，也是一项昂贵的工作。我希望大多数程序员会忽略并行性并安静地继续他们的工作。当一位熟练的专家推出一款平行制作或一款出色的电脑游戏时，我会默默地鼓掌并利用他们的努力。如果我希望自己的应用程序具有性能，我将专注于减少内存分配并忽略并行性。
并行性确实很难。 大多数域都很难并行化。像并行 make 这样广泛可重用的异常是令人高兴的。

摘要（这是我从一家领先的 CPU 制造商工作的主讲人那里听到的）：业界之所以支持多核，是因为他们无法继续让机器运行得更快、更热，而且他们不知道如何处理额外的晶体管。现在他们迫切希望找到一种使多核盈利的方法，因为如果他们没有利润，他们就无法建造下一代晶圆厂生产线。肉汁列车已经结束，我们实际上可能必须开始关注软件成本。

许多认真对待并行性的人都忽略了这些玩具 4 核甚至 32 核机器，转而青睐具有 128 个或更多处理器的 GPU。我的猜测是，真正的行动将会在那里。

My research work includes work on compilers and on spam filtering. I also do a lot of 'personal productivity' Unix stuff. Plus I write and use software to administer classes that I teach, which includes grading, testing student code, tracking grades, and myriad other trivia.

Multicore affects me not at all except as a research problem for compilers to support other applications. But those problems lie primarily in the run-time system, not the compiler.
At great trouble and expense, Dave Wortman showed around 1990 that you could parallelize a compiler to keep four processors busy. Nobody I know has ever repeated the experiment. Most compilers are fast enough to run single-threaded. And it's much easier to run your sequential compiler on several different source files in parallel than it is to make your compiler itself parallel. For spam filtering, learning is an inherently sequential process. And even an older machine can learn hundreds of messages a second, so even a large corpus can be learned in under a minute. Again, training is fast enough.
The only significant way I have of exploiting parallel machines is using parallel make. It is a great boon, and big builds are easy to parallelize. Make does almost all the work automatically. The only other thing I can remember is using parallelism to time long-running student code by farming it out to a bunch of lab machines, which I could do in good conscience because I was only clobbering a single core per machine, so using only 1/4 of CPU resources. Oh, and I wrote a Lua script that will use all 4 cores when ripping MP3 files with lame. That script was a lot of work to get right.
I will ignore tens, hundreds, and thousands of cores. The first time I was told "parallel machines are coming; you must get ready" was 1984. It was true then and is true today that parallel programming is a domain for highly skilled specialists. The only thing that has changed is that today manufacturers are forcing us to pay for parallel hardware whether we want it or not. But just because the hardware is paid for doesn't mean it's free to use. The programming models are awful, and making the thread/mutex model work, let alone perform well, is an expensive job even if the hardware is free. I expect most programmers to ignore parallelism and quietly get on about their business. When a skilled specialist comes along with a parallel make or a great computer game, I will quietly applaud and make use of their efforts. If I want performance for my own apps I will concentrate on reducing memory allocations and ignore parallelism.
Parallelism is really hard. Most domains are hard to parallelize. A widely reusable exception like parallel make is cause for much rejoicing.

Summary (which I heard from a keynote speaker who works for a leading CPU manufacturer): the industry backed into multicore because they couldn't keep making machines run faster and hotter and they didn't know what to do with the extra transistors. Now they're desperate to find a way to make multicore profitable because if they don't have profits, they can't build the next generation of fab lines. The gravy train is over, and we might actually have to start paying attention to software costs.

Many people who are serious about parallelism are ignoring these toy 4-core or even 32-core machines in favor of GPUs with 128 processors or more. My guess is that the real action is going to be there.

您如何利用多核？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（22）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。