在多线程应用程序中,多核或多处理器安排会更好吗?

发布于 2024-08-31 10:51:11 字数 637 浏览 4 评论 0原文

我已经在这里阅读了很多关于这个主题的内容(例如,stackoverflow.com/questions/1713554/threads-processes-vs-multithreading-multi-core-multiprocessor-how-they-are 或 多 CPU、多核和超线程)和其他地方(例如 ixbtlabs.com/ articles2/cpu/rmmt-l2-cache.html 或 software.intel.com/en-us/articles/multi-core-introduction/),但我仍然不确定一些看起来非常简单的事情。所以我想我只是问一下。

(1) 每个核心都有专用高速缓存的多核处理器实际上与多处理器系统相同(当然在处理器速度、高速缓存大小等方面进行平衡)?

(2) 假设我有一些图像需要分析(即计算机视觉),并且我将这些图像加载到 RAM 中。我的应用程序为每个需要分析的图像生成一个线程。此应用程序在共享缓存多核处理器上的运行速度是否会比在专用缓存多核处理器上运行速度慢,而后者的运行速度是否与在等效的单核多处理器计算机上运行速度相同?

谢谢您的帮助!

I've read a lot on this topic already both here (e.g., stackoverflow.com/questions/1713554/threads-processes-vs-multithreading-multi-core-multiprocessor-how-they-are or multi-CPU, multi-core and hyper-thread) and elsewhere (e.g., ixbtlabs.com/articles2/cpu/rmmt-l2-cache.html or software.intel.com/en-us/articles/multi-core-introduction/), but I still am not sure about a couple things that seem very straightforward. So I thought I'd just ask.

(1) Is a multi-core processor in which each core has dedicated cache effectively the same as a multiprocessor system (balanced of course for processor speed, cache size, and so on)?

(2) Let's say I have some images to analyze (i.e., computer vision), and I have these images loaded into RAM. My app spawns a thread for each image that needs to be analyzed. Will this app on a shared cache multi-core processor run slower than on a dedicated cache multi-core processor, and would the latter run at the same speed as on an equivalent single-core multiprocessor machine?

Thank you for the help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

动次打次papapa 2024-09-07 10:51:11

缓存的大小很重要。为此,我假设 x86 处理器并且仅使用 L2 缓存,该缓存在双核处理器上共享。

如果您将 2 个单核处理器与 1 个双核处理器进行比较,并且单核处理器都具有相同数量的数据缓存(以相同的速度运行),那么您将拥有更多缓存,因此图像的更多部分可以放入缓存中,并且很可能如果图像数据的处理必须重复加载和/或存储该数据,那么在相同的时钟速度下这会更快。

如果将 2 个单核处理器与 1 个双核处理器进行比较,其数据缓存是每个单核处理器数据缓存大小的两倍,那么大约一半的数据缓存将用于每个处理器的工作。除了每个独立线程必须使用的图像数据之外,很可能还会有一些共享数据。如果此共享数据存储在共享数据缓存中,那么它可以比在 2xSingle 核心设置上更轻松地在两个核心之间共享。在 2xSingle 核心设置中,每个共享数据块的一个缓存将存储它,当其他处理器需要使用该数据时,会产生一点开销。

双核机器还使线程更容易从同一处理器模块上的一个核心迁移到另一个核心,因为线程的新处理器的缓存不需要被填充,而另一个处理器有它不需要再占用的数据空间。

我建议,无论您最终得到什么,您都可以尝试将线程数量限制为每个核心 3 到 10 个,以供一般使用。所有线程都将相互竞争该缓存空间,因此过多的线程将导致该线程重新调度之前,来自 1 个线程的所有数据都被推出。另外,如果每个线程都可以循环访问几个图像文件,那么通过鼓励每个线程的堆栈空间保留在缓存中,您会获得一些好处,因为您的堆栈较少。您还可以减少操作系统为跟上线程而必须使用的内存量。

当您可以将处理与慢速访问(例如磁盘、网络或人机交互)重叠时,您最大的胜利就是,因此您需要足够的线程来保持 CPU 忙于处理。

The size of the cache is important. For the sake of this I'm assuming x86 processors and only using the L2 cache, which is shared on dual core processors.

If you are comparing 2 single core processors with 1 dual core processor and the single core processors both have the same amount of data cache (running at the same speed), then you have more cache, so more portions of the images can fit into cache, and it is very likely that if the processing of the image data had to load and/or store to this data repeatedly that this would go more quickly at the same clock speeds.

If you are comparing 2 single core processors with 1 dual core processor whose data cache is twice the size of each single core processor's data cache, then about half of the data cache will be used for each processor's work. It is quit likely that in addition to the image data that each independent thread has to use that there will be some shared data. If this shared data is stored in the shared data cache then it can be more easily shared between the two cores than on the 2xSingle core set up. On the 2xSingle core setup for each chunk of shared data one of the caches would store it and there would be a little bit of overhead when the other processor needed to use that data.

Dual core machines also make it easier for threads to migrate from one core to another on the same processor module, because the cache of the thread's new processor does not need to be filled while the other has data that it doesn't need anymore taking up space.

I'd suggest that whatever you end up with that you experiment with limiting the number of threads to 3 to 10 per-core at any time for general use. The threads will all be competing with each other for that cache space, so too many will make it so that all of the data from 1 thread is pushed out before that thread is rescheduled. Also, if each thread can loop over a few image files you gain a little by encouraging each thread's stack space to stay in cache because you have fewer stacks. You also reduce the amount of memory that the OS has to use to keep up with threads.

You're biggest win is when you can overlap processing with slow access, such as disk, network, or human interaction, so just enough threads to keep the CPUs busy processing is what you need.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文