Hadoop namenode:单点故障

发布于 2024-10-08 12:54:38 字数 160 浏览 9 评论 0原文

Hadoop架构中的Namenode是一个单点故障。

拥有大型 Hadoop 集群的人们如何应对这个问题?

是否有一种行业认可的、行之有效的解决方案,其中辅助 Namenode 会在主 Namenode 发生故障时接管? EM>

The Namenode in the Hadoop architecture is a single point of failure.

How do people who have large Hadoop clusters cope with this problem?.

Is there an industry-accepted solution that has worked well wherein a secondary Namenode takes over in case the primary one fails ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

心舞飞扬 2024-10-15 12:54:38

Yahoo 对于不同集群大小的配置设置有某些建议,以考虑 NameNode 故障。例如:

Hadoop 集群中的单点故障是 NameNode。虽然任何其他机器的丢失(间歇性或永久)不会导致数据丢失,但 NameNode 丢失会导致集群不可用。 NameNode 数据永久丢失将导致集群的 HDFS 无法运行。

因此,在此配置中应采取另一个步骤来备份NameNode元数据

Facebook 使用 Hadoop 的调整版本的 NameNode 元数据 其数据仓库;它有一些优化,专注于NameNode可靠性。除了 github 上提供的补丁之外,Facebook 似乎还使用 AvatarNode专门用于主备NameNode之间的快速切换。 Dhruba Borthakur 的博客 包含其他几个条目,提供了对作为单点故障的 NameNode 的进一步见解。

编辑: 有关 Facebook 对 NameNode 的改进的更多信息

Yahoo has certain recommendations for configuration settings at different cluster sizes to take NameNode failure into account. For example:

The single point of failure in a Hadoop cluster is the NameNode. While the loss of any other machine (intermittently or permanently) does not result in data loss, NameNode loss results in cluster unavailability. The permanent loss of NameNode data would render the cluster's HDFS inoperable.

Therefore, another step should be taken in this configuration to back up the NameNode metadata

Facebook uses a tweaked version of Hadoop for its data warehouses; it has some optimizations that focus on NameNode reliability. Additionally to the patches available on github, Facebook appears to use AvatarNode specifically for quickly switching between primary and secondary NameNodes. Dhruba Borthakur's blog contains several other entries offering further insights into the NameNode as a single point of failure.

Edit: Further info about Facebook's improvements to the NameNode.

故人的歌 2024-10-15 12:54:38

Namenode 的高可用性已随 Hadoop 2.x 版本引入。

它可以通过两种模式来实现 - 使用 NFS 使用 QJM

但是,Quorum Journal Manager (QJM) 的高可用性是首选。

在典型的 HA 集群中,两台独立的机器被配置为 NameNode。在任何时间点,只有一个NameNode处于Active状态,另一个处于Standby状态。 活动 NameNode 负责集群中的所有客户端操作,而备用 NameNode 只是充当从属节点,维护足够的状态以在必要时提供快速故障转移。

请看下面的 SE 问题,其中解释了完整的故障转移过程。

Hadoop 2.x 中的辅助 NameNode 使用和高可用性

Hadoop Namenode 故障转移过程如何工作?

High Availability of Namenode has been introduced with Hadoop 2.x release.

It can be achieved in two modes - With NFS and With QJM

But high availability with Quorum Journal Manager (QJM) is preferred option.

In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.

Have a look at below SE questions, which explains complete failover process.

Secondary NameNode usage and High availability in Hadoop 2.x

How does Hadoop Namenode failover process works?

陌路黄昏 2024-10-15 12:54:38

大型 Hadoop 集群拥有数千个数据节点和一个名称节点。故障概率随机器数量线性增加(其他条件相同)。因此,如果 Hadoop 无法应对数据节点故障,它就无法扩展。由于仍然只有一个名称节点,因此存在单点故障 (SPOF),但故障概率仍然很低。

令人悲伤的是,Bkkbrad 关于 Facebook 向名称节点添加故障转移功能的回答是正确的。

Large Hadoop clusters have thousands of data nodes and one name node. The probability of failure goes up linearly with machine count (all else being equal). So if Hadoop didn't cope with data node failures it wouldn't scale. Since there's still only one name node the Single Point of Failure (SPOF) is there, but the probability of failure is still low.

That sad, Bkkbrad's answer about Facebook adding failover capability to the name node is right on.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文