对于N个线程的并行算法,性能增益可以大于N吗?

发布于 2024-12-11 14:00:11 字数 103 浏览 0 评论 0原文

一个理论问题,也许是显而易见的:

一个算法在以 N 个线程并行方式实现后,是否有可能比原始单线程算法的执行速度快 N 倍以上?换句话说,增益是否可以更好地与线程数量呈线性关系?

A theoretical question, maybe it is obvious:

Is it possible that an algorithm, after being implemented in a parallel way with N threads, will be executed more than N times faster than the original, single-threaded algorithm? In other words, can the gain be better that linear with number of threads?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

习ぎ惯性依靠 2024-12-18 14:00:11

这并不常见,但绝对是可能的。

例如,考虑构建一个软件管道,其中管道中的每个步骤执行相当少量的计算,但需要足够的静态数据来大致填充整个数据缓存 - 但每个步骤都使用不同的静态数据。

在这种情况下,单个处理器上的串行计算通常主要受到主存储器的带宽的限制。假设您(至少)拥有与管道步骤一样多的处理器/核心(每个处理器/核心都有自己的数据缓存),您可以加载每个数据缓存一次,并处理一个又一个的数据包,保留所有这些都具有相同的静态数据。现在,您的计算可以按照处理器的速度进行,而不受主内存带宽的限制,因此速度的提高很容易比线程数高 10 倍。

理论上,您可以使用具有非常大的缓存的单个处理器来完成相同的任务。然而,从实际的角度来看,处理器和缓存大小的选择相当有限,因此如果您想使用更多缓存,则需要使用更多处理器 - 大多数系统提供的实现此目的的方式是多线程的。

It's not common, but it most assuredly is possible.

Consider, for example, building a software pipeline where each step in the pipeline does a fairly small amount of calculation, but requires enough static data to approximately fill the entire data cache -- but each step uses different static data.

In a case like this, serial calculation on a single processor will normally be limited primarily by the bandwidth to main memory. Assuming you have (at least) as many processors/cores (each with its own data cache) as pipeline steps, you can load each data cache once, and process one packet of data after another, retaining the same static data for all of them. Now your calculation can proceed at the processor's speed instead of being limited by the bandwidth to main memory, so the speed improvement could easily be 10 times greater than the number of threads.

Theoretically, you could accomplish the same with a single processor that just had a really huge cache. From a practical viewpoint, however, the selection of processors and cache sizes is fairly limited, so if you want to use more cache you need to use more processors -- and the way most systems provide to accomplish this is with multiple threads.

挽袖吟 2024-12-18 14:00:11

是的。

我看到了一种通过复杂的操作来移动机器人手臂的算法,该算法基本上分为 N 个线程,并且让每个线程在解决方案空间中或多或少地随机移动。 (这不是一种实用算法。)统计数据清楚地显示了一个线程的超线性加速。显然,随着时间的推移,找到解决方案的概率上升得相当快,然后趋于平稳,因此优势在于进行大量的初始尝试。

Yes.

I saw an algorithm for moving a robot arm through complicated maneuvers that was basically to divide into N threads, and have each thread move more or less randomly through the solution space. (It wasn't a practical algorithm.) The statistics clearly showed a superlinear speedup over one thread. Apparently the probability of hitting a solution over time rose fairly fast and then leveled out some, so the advantage was in having a lot of initial attempts.

节枝 2024-12-18 14:00:11

Amdahl 定律(并行化)告诉我们这对于一般情况是不可能的。最好的情况下,我们可以将工作完美地除以 N。原因是,如果没有串行部分,阿姆达尔的加速公式将变为:

加速比 = 1/(1/N)

其中 N 是处理器的数量。这当然减少到只有 N。

Amdahl's law (parallelization) tells us this is not possible for the general case. At best we can perfectly divide the work by N. The reason for this is that given no serial portion, Amdahl's formula for speedup becomes:

Speedup = 1/(1/N)

where N is the number of processors. This of course reduces to just N.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文