Hadoop 数据节点、名称节点、辅助名称节点、作业跟踪器和任务跟踪器

发布于 2024-12-11 06:18:10 字数 452 浏览 0 评论 0原文

我是hadoop新手,所以我有一些疑问。如果主节点发生故障,hadoop 集群会发生什么情况?我们能否在没有任何损失的情况下恢复该节点?是否可以让一个辅助主节点在当前主节点故障时自动切换到主节点?

我们有namenode(Secondary namenode)的备份,因此当它出现故障时,我们可以从Secondary namenode恢复namenode。这样,当datanode出现故障时,我们如何恢复datanode中的数据呢? secondary namenode是namenode的备份,只是不备份到datenode,对吗?如果某个节点在作业完成之前发生故障,因此作业跟踪器中有待处理的作业,那么该作业是继续还是从空闲节点中的第一个作业重新启动?

如果发生什么情况,如何恢复整个集群的数据呢?

我的最后一个问题是,我们可以在 Mapreduce 中使用 C 程序吗(例如,mapreduce 中的冒泡排序)?

提前致谢

I am new in hadoop so I have some doubts. If the master-node fails what happened the hadoop cluster? Can we recover that node without any loss? Is it possible to keep a secondary master-node to switch automatically to the master when the current one fails?

We have the backup of the namenode (Secondary namenode), so we can restore the namenode from Secondary namenode when it fails. Like this, How can we restore the data's in datanode when the datanode fails? The secondary namenode is the backup of namenode only not to datenode, right? If a node is failed before completion of a job, so there is job pending in job tracker, is that job continue or restart from the first in the free node?

How can we restore the entire cluster data if anything happens?

And my final question, can we use C program in Mapreduce (For example, Bubble sort in mapreduce)?

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

后来的我们 2024-12-18 06:18:10

虽然现在回答你的问题已经太晚了,但它可能会帮助其他人。

首先让我向您介绍一下辅助名称节点:

它包含名称空间映像,编辑过去的日志文件备份
小时(可配置)。它的工作就是合并最新的Name Node
NameSpaceImage 并编辑日志文件以上传回名称节点
更换旧的。在集群中拥有辅助 NN 并不
强制。

现在来谈谈您的担忧..

  • 如果主节点发生故障,hadoop 集群会发生什么?

支持Frail的回答,是的,hadoop有单点故障,所以
您当前正在运行的整个任务,例如 Map-Reduce 或任何其他任务
正在使用发生故障的主节点将停止。整个集群包括
客户端将停止工作。

  • 我们可以在没有任何损失的情况下恢复该节点吗?

这是假设的,没有损失是不可能的,因为所有的
由数据节点发送到名称的数据(块报告)将丢失
辅助名称节点上次备份后的节点。为什么我提到
至少,因为如果名称节点在成功备份运行后立即失败
通过辅助名称节点,那么它就处于安全状态。

  • 是否可以让一个辅助主节点在当前主节点故障时自动切换到主节点?

管理员(用户)完全可以做到这一点。并切换它
自动地,您必须在集群之外编写本机代码,代码
监视将配置辅助名称节点的集群
巧妙地并使用新的名称节点地址重新启动集群。

  • 我们有namenode(Secondary namenode)的备份,因此当它失败时我们可以从Secondary namenode恢复namenode。像这样,当datanode出现故障时,我们如何恢复datanode中的数据?

它是关于复制因子,我们有3(默认为最佳实践,
可配置)每个文件块的副本都位于不同的数据节点中。
因此,如果暂时出现故障,我们有 2 个备份数据节点。
稍后名称节点将创建失败数据的另一个副本
包含数据节点。

  • 辅助namenode只是namenode的备份,不备份到datenode,对吗?

对。它只包含数据节点的所有元数据,如数据节点
地址,属性包括每个数据节点的块报告。

  • 如果节点在作业完成之前发生故障,因此作业跟踪器中有待处理的作业,该作业是继续还是从空闲节点中的第一个作业重新启动?

HDFS 将强制尝试继续该作业。但这又取决于
复制因子、机架感知其他配置
行政。但如果遵循 Hadoop 关于 HDFS 的最佳实践,那么它
不会失败。 JobTracker 将获取复制的节点地址
继续。

  • 出现问题如何恢复整个集群数据?

通过重新启动它。

  • 我的最后一个问题,我们可以在 Mapreduce 中使用 C 程序(例如,mapreduce 中的冒泡排序)吗?

是的,您可以使用任何支持标准文件的编程语言
读写操作。

我刚刚尝试过。希望它能帮助您和其他人。

*欢迎提出建议/改进。*

Although, It is too late to answer your question but just It may help others..

First of all let me Introduce you with Secondary Name Node:

It Contains the name space image, edit log files' back up for past one
hour (configurable). And its work is to merge latest Name Node
NameSpaceImage and edit logs files to upload back to Name Node as
replacement of the old one. To have a Secondary NN in a cluster is not
mandatory.

Now coming to your concerns..

  • If the master-node fails what happened the hadoop cluster?

Supporting Frail's answer, Yes hadoop has single point of failure so
whole of your currently running task like Map-Reduce or any other that
is using the failed master node will stop. The whole cluster including
client will stop working.

  • Can we recover that node without any loss?

That is hypothetical, Without loss it is least possible, as all the
data (block reports) will lost which has sent by Data nodes to Name
node after last back up taken by secondary name node. Why I mentioned
least, because If name node fails just after a successful back up run
by secondary name node then it is in safe state.

  • Is it possible to keep a secondary master-node to switch automatically to the master when the current one fails?

It is staright possible by an Administrator (User). And to switch it
automatically you have to write a native code out of the cluster, Code
to moniter the cluster that will cofigure the secondary name node
smartly and restart the cluster with new name node address.

  • We have the backup of the namenode (Secondary namenode), so we can restore the namenode from Secondary namenode when it fails. Like this, How can we restore the data's in datanode when the datanode fails?

It is about replication factor, We have 3 (default as best practice,
configurable) replicas of each file block all in different data nodes.
So in case of failure for time being we have 2 back up data nodes.
Later Name node will create one more replica of the data that failed
data node contained.

  • The secondary namenode is the backup of namenode only not to datenode, right?

Right. It just contains all the metadata of data nodes like data node
address,properties including block report of each data node.

  • If a node is failed before completion of a job, so there is job pending in job tracker, is that job continue or restart from the first in the free node?

HDFS will forcely try to continue the job. But again it depends on
replication factor, rack awareness and other configuration made by
admin. But if following Hadoop's best practices about HDFS then it
will not get failed. JobTracker will get replicated node address to
continnue.

  • How can we restore the entire cluster data if anything happens?

By Restarting it.

  • And my final question, can we use C program in Mapreduce (For example, Bubble sort in mapreduce)?

yes, you can use any programming language which support Standard file
read write operations.

I Just gave a try. Hope it will help you as well as others.

*Suggestions/Improvements are welcome.*

岁吢 2024-12-18 06:18:10

目前hadoop集群存在单点故障,即namenode。

关于辅助节点问题(来自 apache wiki):

术语“辅助名称节点”有些误导。它不是一个
名称节点是指数据节点无法连接到辅助节点
名称节点,在任何情况下它都不能取代主名称节点
失败的案例。

辅助名称节点的唯一目的是执行定期
检查站。辅助名称节点定期下载当前的
名称节点映像并编辑日志文件,将它们加入新映像并
将新图像上传回(主要且唯一的)名称节点。
请参阅用户指南。

因此,如果名称节点发生故障,您可以在同一物理节点上重新启动它
节点那么不需要关闭数据节点,只需关闭名称节点
需要重新启动。如果您无法再使用旧节点,您将
需要将最新的图像复制到其他地方。最新的图片可以是
在故障前曾经是主节点的节点上找到任一节点,如果
可用的;或在辅助名称节点上。后者将是
没有后续编辑日志的最新检查点,即最
最近的名称空间修改可能会丢失。你也会
这种情况下需要重启整个集群。

有一些棘手的方法可以克服这个单点故障。如果您使用的是cloudera发行版,请使用此处解释的方法之一。 Mapr 发行版有一种不同的方式来处理这种欺骗。

最后,您可以使用每种编程语言通过 hadoop 流 编写 MapReduce。

Currently hadoop cluster has a single point of failure which is namenode.

And about the secondary node isssue (from apache wiki) :

The term "secondary name-node" is somewhat misleading. It is not a
name-node in the sense that data-nodes cannot connect to the secondary
name-node, and in no event it can replace the primary name-node in
case of its failure.

The only purpose of the secondary name-node is to perform periodic
checkpoints. The secondary name-node periodically downloads current
name-node image and edits log files, joins them into new image and
uploads the new image back to the (primary and the only) name-node.
See User Guide.

So if the name-node fails and you can restart it on the same physical
node then there is no need to shutdown data-nodes, just the name-node
need to be restarted. If you cannot use the old node anymore you will
need to copy the latest image somewhere else. The latest image can be
found either on the node that used to be the primary before failure if
available; or on the secondary name-node. The latter will be the
latest checkpoint without subsequent edits logs, that is the most
recent name space modifications may be missing there. You will also
need to restart the whole cluster in this case.

There are tricky ways to overcome this single point of failure. If you are using cloudera distribution, one of the ways explained here. Mapr distribution has a different way to handle to this spof.

Finally, you can use every single programing language to write map reduce over hadoop streaming.

如梦 2024-12-18 06:18:10

虽然,现在回答你的问题已经太晚了,但它可能会帮助其他人..首先我们将讨论 Hadoop 1.X 守护进程的角色,然后讨论你的问题..

1.辅助名称Node的作用是什么
它不完全是一个备份节点。它定期读取编辑日志并为名称节点创建更新的 fsimage 文件。它定期从名称节点获取元数据并保留它并在名称节点发生故障时使用。
2.名称节点的作用是什么
它是所有守护进程的管理者。它的主jvm进程在主节点上运行。它与数据节点交互。

3.工作跟踪器的作用是什么
它接受作业并分发给任务跟踪器以在数据节点进行处理。它被称为映射过程

4。任务跟踪器的作用是什么
它将执行为处理数据节点处的现有数据而提供的程序。该过程称为映射。

hadoop 1.X 的局限性

  1. 单点故障
    这是名称节点,因此我们可以为名称节点维护高质量的硬件。如果名称节点失败,一切都将无法访问

解决方案
单点故障的解决方案是hadoop 2.X,它提供了高可用性。

hadoop 2.X 的高可用性

现在是您的主题....

我们如何恢复整个集群数据如果发生什么情况?
如果集群失败,我们可以重新启动它。

如果节点在作业完成之前发生故障,因此作业跟踪器中有待处理的作业,该作业是继续还是从空闲节点中的第一个节点重新启动?
我们有默认的 3 个数据副本(我的意思是块)来获得高可用性,这取决于管理员设置了多少副本...因此作业跟踪器将继续使用其他数据节点上的其他数据副本

我们可以使用Mapreduce中的C程序(例如mapreduce中的冒泡排序)?
基本上,mapreduce 是执行引擎,它将以(存储加处理)分布式方式解决或处理大数据问题。我们正在使用 MapReduce 编程进行文件处理和所有其他基本操作,因此我们可以使用任何可以根据要求处理文件的语言。

hadoop 1.X架构
hadoop 1.x 有 4 个基本守护进程

我刚刚尝试过。希望它能帮助您以及其他人。

欢迎提出建议/改进。

Although, It is too late to answer your question but just It may help others..firstly we will discuss role of Hadoop 1.X daemons and then your issues..

1. What is role of secondary name Node
it is not exactly a backup node. it reads a edit logs and create updated fsimage file for name node periodically. it get metadata from name node periodically and keep it and uses when name node fails.
2. what is role of name node
it is manager of all daemons. its master jvm proceess which run at master node. it interact with data nodes.

3. what is role of job tracker
it accepts job and distributes to task trackers for processing at data nodes. its called as map process

4. what is role of task trackers
it will execute program provided for processing on existing data at data node. that process is called as map.

limitations of hadoop 1.X

  1. single point of failure
    which is name node so we can maintain high quality hardware for the name node. if name node fails everything will be inaccessible

Solutions
solution to single point of failure is hadoop 2.X which provides high availability.

high availability with hadoop 2.X

now your topics ....

How can we restore the entire cluster data if anything happens?
if cluster fails we can restart it..

If a node is failed before completion of a job, so there is job pending in job tracker, is that job continue or restart from the first in the free node?
we have default 3 replicas of data(i mean blocks) to get high availability it depends upon admin that how much replicas he has set...so job trackers will continue with other copy of data on other data node

can we use C program in Mapreduce (For example, Bubble sort in mapreduce)?
basically mapreduce is execution engine which will solve or process big data problem in(storage plus processing) distributed manners. we are doing file handling and all other basic operations using mapreduce programming so we can use any language of where we can handle files as per the requirements.

hadoop 1.X architecture
hadoop 1.x has 4 basic daemons

I Just gave a try. Hope it will help you as well as others.

Suggestions/Improvements are welcome.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文