What is a Cluster Filesystem?
分类方法从来就没有固定和唯一标准的,下面的看法仅供参考.
http://www.beedub.com/clusterfs.html
This is a short taxonomy of the kinds of distributed filesystems you can find today (Febrary 2004). This was assembled with some help from Garth Gibson and Larry Jones.
Distributed filesystem - the generic term for a client/server or "network" filesystem where the data isn't locally attached to a host. There are lots of different kinds of distributed filesystems, the first ones coming out of research in the 1980s. NFS and CIFS are the most common distributed filesystems today
Global filesystem - this refers to the namespace, so that all files have the same name and path name when viewed from all hosts. This obviously makes it easy to share data across machines and users in different parts of the organization. For example, the WWW is a global namespace because a URL works everywhere. But, filesystems don't always have that property because your share definitions may not match mine, we may not see the same file servers or the same portions of those file servers.
AFS was an early provider of a global namespace - all files were organized under /afs/cellname/... and you could assemble AFS cells even from different organizations (e.g., different universities) into one shared filesystem. The Panasas filesystem (PanFS) supports a similar structure, if desired.
SAN filesystem - these provide a way for hosts to share Fibre Channel storage, which is traditionally carved into private chunks bound to different hosts. To provide sharing, a block-level metadata manager controls access to different SAN devices. A SAN Filesystem mounts storage natively in only one node, but connects all nodes to that storage and distributes block addresses to other nodes. Scalability is often an issue because blocks are a low-level way to share data placing a big burden on the metadata managers and requiring large network transactions in order to access data.
Examples include SGI cXFS, IBM GPFS, Red Hat Sistina, IBM SanFS, EMC Highroad and others.
Symmetric filesystems - A symmetric filesystem is one in which the clients also run the metadata manager code; that is, all nodes understand the disk structures. A concern with these systems is the burden that metadata management places on the client node, serving both itself and other nodes, which may impact the ability of the client to perform its intended compute jobs. Examples include Sistina GFS, GPFS, Compaq CFS, Veritas CFS, Polyserve Matrix
Asymmetric filesystems - An asymmetric filesystem is one in which there are one or more dedicated metadata managers that maintain the filesystem and its associated disk structures. Examples include Panasas ActiveScale, IBM SanFS, and Lustre. Traditional client/server filesystems like NFS and CIFS are also asymmetric.
Cluster filesystem - a distributed filesystem that is not a single server with a set of clients, but instead a cluster of servers that all work together to provide high performance service to their clients. To the clients the cluster is transparent - it is just "the filesystem", but the filesystem software deals with distributing requests to elements of the storage cluster.
Examples include: HP (DEC) Tru64 cluster and Spinnaker is a clustered NAS (NFS) service. Panasas ActiveScale is a cluster filesystem
Parallel filesystem - file systems with support for parallel applications, all nodes may be accessing the same files at the same time, concurrent read and write. Examples of this include: Panasas ActiveScale, Lustre, GPFS and Sistina.
Finally, these definitions overlap. A SAN filesystem can be symmetric or asymmetric. Its servers can be clustered or single. And it can support parallel apps or not.
The Panasas Storage Cluster and its ActiveScale File System is a clustered (many servers share the work), asymmetric (metadata management does not happen on the clients), parallel (supports concurrent read and write well), object-based (not block-based) distributed (storage is across the network from clients) file system.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
哪并行文件系统和网络文件系统如CODA有何区别呢?
谢谢NNTP的回答。 似乎Lustre是目前最佳的选择。
我觉得不能这样理解.
我个人觉得应该区分清楚两个定义,一个是集群化的文件系统,一个是集群文件系统.
数据存放的位置是由多台服务器协同完成的访问服务的,我认为这种定义的颗粒比较粗,都可以叫做集群化的文件系统. 在这样一个粗的框架下面, 可以更细致的定义出并行文件系统,集群文件系统等等.
并行文件系统意味着一个统一的文件系统以及他的数据被分散在多个服务提供者和存储者的载体上,可以看作文件系统的"RAID". lustre, PVFS/2 都属于这样的类型.
集群文件系统意味着服务提供者本身是一个标准意义上的集群环境,具有普通集群的特征,比如成员关系,集群成员范围内的锁, 心跳等等. 而文件存储者是单一的,需要服务者通过相互的集群关系来控制对单一存储位置数据的访问. 所以我的归类中的集群文件系统和并行文件系统是完全不同的两个东西. 他们的服务提供者之间的关系以及文件存储者以及数据位置,都不一样.
考察并行中比较典型的PVFS2/lustre 和 集群文件系统Sistina(redhat)的GFS,应该就可以建立起自己的分类概念.
hello , 这不是一个类,是一种归类的定义.
lustre我认为是并行的分布式文件系统, 既并行,且分布.(当然也是cluster化的)
[ 本帖最后由 nntp 于 2006-11-5 12:45 编辑 ]
对于集群文件系统和并行文件系统,可不可以这样理解:
集群文件系统就是在机群中使用的文件系统。
并行文件系统是支持并行程序的文件系统,所有节点可以同时对文件系统中的同一文件进行读写操作。
请教一下,关于`a cluster of servers that all work together to provide high performance service to their clients. To the clients the cluster is transparent - it is just "the filesystem", but the filesystem software deals with distributing requests to elements of the storage cluster.'
现有成熟的产品中,有哪些能做到这上边这段说所说的构架? Lustre, RH-GFS?
当然算
IBM SanFS is SANergy or not ?
thanks