Cluster 和 MPP 超级计算机架构有什么区别?
Cluster 和 MPP 超级计算机架构有什么区别?
What is the difference between a Cluster and MPP supercomputer architecture?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
Cluster 和 MPP 超级计算机架构有什么区别?
What is the difference between a Cluster and MPP supercomputer architecture?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(4)
在集群中,每台机器在内存、磁盘等方面很大程度上独立于其他机器。它们通过正常网络的一些变化进行互连。集群主要存在于程序员的脑海中以及他/她如何选择分配工作。
在大规模并行处理器中,实际上只有一台机器,其中数千个CPU紧密互连。 MPP 具有独特的内存架构,允许与相邻处理器进行极高速的中间结果交换。
主要变体是SIMD(单指令、多数据)和MIMD(多指令、多数据)。在 SIMD 系统中,每个处理器同时执行相同的指令,只是在内存的不同位上执行。本质上,只有一个程序计数器。在 MIMD 机器中,每个 CPU 都有自己的 PC。
MPP 的编程可能很麻烦,并且仅适用于令人尴尬的并行(这实际上就是他们所说的)算法。然而,如果您遇到这样的问题,那么 MPP 的速度可能会快得惊人。它们也非常昂贵。
In a cluster, each machine is largely independent of the others in terms of memory, disk, etc. They are interconnected using some variation on normal networking. The cluster exists mostly in the mind of the programmer and how s/he chooses to distribute the work.
In a Massively Parallel Processor, there really is only one machine with thousands of CPUs tightly interconnected. MPPs have exotic memory architectures to allow extremely high speed exchange of intermediate results with neighboring processors.
The major variants are SIMD (Single Instruction, Multiple Data) and MIMD (Multiple Instruction, Multiple Data). In a SIMD system, every processor is executing the same instruction at the same time, only on different bits of memory. Essentially, there is only one Program Counter. In a MIMD machine, each CPU has it's own PC.
MPPs can be a bitch to program and are of use only on algorithms that are embarrassingly parallel (that's actually what they call it). However, if you have such a problem, then an MPP can be shockingly fast. They are also incredibly expensive.
top500 列表在 MPP 和集群之间使用了略有不同的区别,如 Dongarra 等人论文:
与集群相比,现代 MPP(例如 IBM Blue Gene)集成得更加紧密:单个节点无法它们独立运行,并通过自定义网络(如多维环面)连接。但是,与集群类似,不存在跨越所有节点的单一共享内存(注意:MPP 可能是分层的,并且共享内存可能在单个节点 (NUMA) 内部或少数节点之间使用)。
因此,在这种情况下,我会非常小心地使用术语 SIMD 和 MIMD,因为它们通常描述共享内存架构 (SMP)。
更新:
Dongarra 等人 链接
更新:
MPP 可以有内部使用共享内存的节点;但整个 MPP 内存不共享。
The top500 list uses a slightly different distinction between an MPP and a cluster, as explained in Dongarra et al. paper:
Compared to a cluster, a modern MPP (such as the IBM Blue Gene) is more tightly-integrated: individual nodes cannot run on their own and they are connected by a custom network (like a multidimensional torus). But, similarly to a cluster, there is no single, shared memory spanning all the nodes (note: an MPP might be hierarchical and shared memory might be used inside a single node (NUMA), or between a handful of nodes).
I'd be thus extremely careful to use terms SIMD and MIMD in this context as they usually describe shared memory architectures (SMP).
Update:
Dongarra et al. link
Update:
MPP can have nodes that use shared memory internally; but the whole MPP memory is not shared.
集群是一群机器,通常是以太网互连(读:网络),每台机器都运行自己的独立操作系统副本,该操作系统恰好服务于单一目的。
MPP 超级计算机通常意味着更快的调解非常快速的互连(例如 SGI NUMALink),支持分布式共享内存(在不同的 MPP 节点上运行进程,这些节点使用快速互连上的共享内存来共享数据,就像它们在一台计算机上运行一样)甚至是单个系统映像(操作系统的单个实例,主要是 Linux,同时在所有节点上运行,就像在一台机器上一样 - 例如任何节点上的“ps aux”将显示运行在其上的所有进程议员)。
正如您所看到的,定义非常不稳定,这更多的是规模问题,而不是明确的差异。
A cluster is a bunch of machines, normally usually Ethernet interconnect (read: network), each running it's own and separate copy of an OS which happen to serve a single purpose.
An MPP supercomputer usually implies a faster propitiatory very fast interconnect (e.g. SGI NUMALink) that supports either Distributed Shared Memory (run processes on different MPP nodes that use shared memory over the fast interconnect to share data as if they were running on a single computer) or even a Single System Image (a single instance of an operating system, mostly Linux, running on all the nodes at the same time as if on a single machine - e.g. "ps aux" on any node will show you all the processes running on the MPP).
As you can see the definition is quite fluid, it's more a question of scale rather than clear cut differences.
我查了很多HPC文献,没有找到MPP的具体定义。对于由多个互连的常规个人计算机或工作站组成的集群,通常与标准技术(如以太网或开源操作系统)相结合,存在相当大的让步。 MPP 一词通常应用于构建分布式内存计算机的更多专有方法,通常具有专有技术。
例如:Tianhe-2 被视为集群,因为它使用 x86-64 节点和常规操作系统(Kylin Linux)。 Sunway TaihuLight 被认为是 MPP,因为它的节点有其特定的架构 SW26010,并在其自己的操作系统 Sunway Raise OS 上运行。
我发现的关于这个问题的最具体的解释是在 Sourcebook of ParallelComputing (Dongarra et al. ):
I've searched in a lot of HPC literature and couldn't find a concrete definition of MPP. There is quite a concesus over a cluster consisting of multiple interconnected regular personal computers or workstations, usually coupled with standard technologies (like Ethernet or open-source operating systems). The term MPP is usually applied to more propietary approches for building distributed-memory computers, usually having propietary technologies.
For example: Tianhe-2 is considered a cluster because it uses x86-64 nodes and a regular operating system (Kylin Linux). Sunway TaihuLight is considered an MPP because its nodes have its particular architecture, SW26010, and work over his own operating system called Sunway Raise OS.
The most concrete explanation of this matter I found was in Sourcebook of Parallel Computing (Dongarra et al.):