多层处理器是否真的并行执行工作?
我正在处理线程和相关主题,例如:过程,上下文切换... 我了解到,在具有多个多个过程的多项处理器真实工作的系统上,多个流程不是真实的。由于过程上下文切换,我们只是对这种工作的幻想。
但是,一个过程中的线程呢,在一个多项处理器上运行。他们真的同时工作还是只是对这种工作的幻想?带有2个硬件内核的处理器一次可以在两个线程上工作吗?如果没有,多核心处理器的意义是什么?
I am deal with threading and related topics like: processes, context switching...
I understand that on a system with one multicore processer real work of more than one processes isn't real. We have just an illusion of such work, because of process context switching.
But, what about threads within one process, that runs on a multicore processor. Does they really work simultaneously or it's also just an illusion of such work? Does processor with 2 hardware cores can work over two threads at a time? If not, what is the point in multicore processors?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,...
...但是,想象一下自己回到维多利亚时代,雇用一堆文员来执行复杂的计算。他们需要的所有数据都在一本书中,他们应该将所有结果写回同一本书。
这本书就像计算机的内存一样,店员就像单独的CPU。由于只有一个店员可以在任何给定时间使用这本书,因此似乎没有多个以上的人,... ...
...除非您给每个店员一个记事本。他们可以去书籍,复制一些数字,然后在他们自己的记事本上工作一段时间,然后才返回从记事本的部分结果复制到书中的部分结果。这允许其他文员在书籍中任何一个文员时做一些有用的工作。
记事本就像是计算机的级别1个缓存与单个CPU关联的内存,并保存已读取的数据副本,或者需要写回主内存。计算机硬件会根据需要自动复制主内存和缓存之间的数据,因此程序不一定需要意识到缓存甚至存在。 (请参阅 https://en.wikipedia.org/wiki/wiki/cache_coherence )
但是, em>程序员应该注意:如果您可以构建程序,以便将不同的线程花费大部分时间读取和编写私人变量,并且相对较少的时间访问与其他线程共享的变量,那么大多数私有变量访问将不超过L1缓存,并且线程将能够并行运行。另一方面,如果所有线程都尝试同时使用相同的变量,或者如果线程都尝试在大量数据上迭代(太大而无法适合缓存),那么它们的能力将较少。并行工作。
另请参阅:
https://en.wikipedia.org/wiki/cache_hierarchy
Yes,...
...But, Imagine yourself back in Victorian times, hiring a bunch of clerks to perform a complex computation. All of the data that they need are in one book, and they're supposed to write all of their results back into the same book.
The book is like a computer's memory, and the clerks are like individual CPUs. Since only one clerk can use the book at any given time, then it might seem as if there's no point in having more than one of them,...
... Unless, you give each clerk a notepad. They can go to the book, copy some numbers, and then work for a while just from their own notepad, before they return to copy a partial result from their notepad into the book. That allows other clerks to do some useful work when any one clerk is at the book.
The notepads are like a computer's Level 1 caches—relatively small areas of high-speed memory that are associated with a single CPU, and which hold copies of data that have been read from, or need to be written back to the main memory. The computer hardware automatically copies data between main memory and the cache as needed, so the program does not necessarily need to be aware that the cache even exists. (see https://en.wikipedia.org/wiki/Cache_coherence)
But, the programmer should to be aware: If you can structure your program so that different threads spend most of their time reading and writing private variables, and relatively little time accessing variables that are shared with other threads, then most of the private variable accesses will go no further than the L1 cache, and the threads will be able to truly run in parallel. If, on the other hand, threads all try to use the same variables at the same time, or if threads all try to iterate over large amounts of data (too large to all fit in the cache,) then they will have much less ability to work in parallel.
See also:
https://en.wikipedia.org/wiki/Cache_hierarchy
实际上,多个核心确实可以并联(至少在所有主流现代CPU架构上)。过程有一个或多个线程。 OS调度程序计划活动任务通常是线程,可为可用核心。当有比可用核心更多的工作任务时,OS使用 preegrtion 如此执行任务同时在每个核心上。
实际上,软件应用程序可以执行同步,这可能会导致某些内核在给定时间段内不活动。硬件操作也可能导致此(例如,等待记忆数据,进行原子操作)。
此外,在现代处理器上,物理内核通常被分成多个可以执行不同任务的硬件线程。这称为 smt (aka hyper-hyper-hyper-hyper-hyper-hyper-threading)。在最近的X86处理器中,同一核心的2个硬件线程可以同时执行2个任务。这些任务可以共享物理核心的部分(例如执行单元),因此对于某些任务,使用2个硬件线程的速度比1个(通常不完全使用处理器内核)更快。
拥有2个无法并行运行但同时以低粒度同时运行的硬件线程仍然有益于性能。实际上,很长一段时间(在过去的十年中)。例如,当任务是延迟绑定的(例如,等待从RAM检索数据)时,可以安排另一个任务,以便进行一些工作,从而提高整体效率。这是SMT的最初目标。对于同一核心上的先发制件任务也是如此(尽管粒度需要更大):一个过程可以执行网络操作并进行抢先,因此另一个过程可以在再次抢占之前进行一些工作,因为从网络收到的数据。
Multiple cores do actually perform work in parallel (at least on all mainstream modern CPU architecture). Processes have one or multiple threads. The OS scheduler schedule active tasks, which are generally threads, to available core. When there are more active tasks than available cores, the OS use preemption so execute tasks concurrently on each core.
In practice, software applications can perform synchronization that may cause some cores to be inactive for a given period of time. Hardware operation can also cause this (eg. waiting for memory data to be retrieved, doing an atomic operation).
Moreover, on modern processors, physical cores are often split in multiple hardware threads that can each execute different tasks. This is called SMT (aka Hyper-threading). On quite recent x86 processors, 2 hardware threads of a same core can simultaneously execute 2 tasks in parallel. The tasks can share parts of the physical core like execution units so using 2 hardware thread can be faster than 1 for some tasks (typically the ones not using fully the processor cores).
Having 2 hardware threads that cannot truly run in parallel but run concurrently at a low granularity can still beneficial for performance. In fact, it was the case for a long time (during the last decade). For example, when a task is latency bound (eg. waiting for data to be retrieved from the RAM), another task can be scheduled so to do some work, improving the overall efficiency. This was the initial goal of SMT. The same is true for pre-empted tasks on a same core (though the granularity need to be much bigger): one process can perform a networking operation and be pre-empted so another process can do some work before being pre-empted again because of data being received from the network.