为什么当传递的数字大于可用核心数时, make -j 的性能会更好?

发布于 2024-10-13 23:52:12 字数 346 浏览 0 评论 0 原文

我有一个带有超线程的四核处理器。当我使用make -j8时,它比make -j4更快(我读取了Java中的核心数,然后调用make -j)。

我不明白为什么当我(用 Java 读取)只有 8 个核心时,make -j32make -j8 更快(超线程使物理核心数量翻倍) )。这怎么可能?

I have a quad-core processor with hyper-threading. When I use make -j8 it is faster than make -j4 (I read the number of cores in Java and then called make -j<number of cores>).

I don't understand why make -j32 is faster than make -j8 when I have (read in Java) just 8 cores (hyper-threading doubles the number of physical cores). How is that possible?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

迷鸟归林 2024-10-20 23:52:12

编译的因素不仅仅是 CPU 速度和可用核心数量:磁盘带宽和内存带宽也很重要。

在你的例子中,我想象每个 CPU HT 兄弟都会执行大约 4 个进程。当它启动一个进程时,它会阻塞磁盘 IO 并移至下一个进程。第二个进程尝试打开第二个文件,阻塞磁盘 IO,并且同级进程移动到下一个进程。在第一个磁盘 IO 准备好之前启动四个编译器不会让我感到惊讶。

因此,当第一个文件最终读入程序源代码时,编译器必须开始搜索目录以查找#included 文件。每个进程都需要一些 open() 调用,然后是 read() 调用,所有这些都可能阻塞,并且所有这些都将放弃同级进程以供其他进程运行。

现在将其乘以八个同级——每个 HT 核心都会运行,直到它阻塞内存访问,此时它将交换到另一个同级,并运行一段时间。一旦第一个同级的内存被提取到缓存中,第二个同级的内存可能就到了停止等待内存的时候了。

使用 make -j 运行编译的速度是有上限的,但 double-number-of-cpus 过去对我来说是一个很好的起点。

There's more to compiling than CPU speed and number of cores available: disk bandwidth and memory bandwith matter a lot too.

In your case, I imagine that each CPU HT sibling is getting roughly 4 processes to execute. As it starts one, it blocks on disk IO and moves onto the next process. The second one tries to open a second file, blocks on disk IO, and the sibling moves onto the next process. Starting four compilers before the first disk IO is ready wouldn't surprise me.

So when the first one finally read in the program source, the compiler must start hunting through directories to find the #included files. Each one requires some open() calls followed by read() calls, all of which can block, and all of which will relinquish the sibling for other processes to run.

Now multiply that by eight siblings -- each HT core will run until it blocks on memory access, at which point it'll swap over to the other sibling, and run for a while. Once the memory of the first sibling has been fetched into the cache, it is probably time for the second sibling to stall while waiting for memory.

There is an upper limit on how much faster you can get your compiles to run by using make -j, but twice-number-of-cpus has been a good starting point for me in the past.

依 靠 2024-10-20 23:52:12

启动更多流程仍然可能给您带来好处。例如,一个进程可以使用 CPU,而同一 CPU 上的另一个进程正在等待文件

Starting more processes can still potentially give you benefits. For example, one process can use the CPU while another process on the same CPU is waiting for file

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文