在 Ubuntu 18 上搭建 hadoop 环境

发布于 2023-01-11 12:59:14 字数 7414 浏览 70 评论 0

准备工作

部署

cd /usr/local/bin/
mkdir hadoop
wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.1.4/hadoop-3.1.4.tar.gz
tar -zxvf hadoop-3.1.4.tar.gz
ln -s hadoop-3.1.4 hadoop

验证

cd /usr/local/bin/hadoop/hadoop
./bin/hadoop version

输出

Hadoop 3.1.4
Source code repository https://github.com/apache/hadoop.git -r 1e877761e8dadd71effef30e592368f7fe66a61b
Compiled by gabota on 2020-07-21T08:05Z
Compiled with protoc 2.5.0
From source with checksum 38405c63945c88fdf7a6fe391494799b
This command was run using /usr/local/bin/hadoop/hadoop-3.1.4/share/hadoop/common/hadoop-common-3.1.4.jar

配置环境变量

# setting hadoop home
export HADOOP_HOME=/usr/local/bin/hadoop/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

修改 core-site 配置文件

cd /usr/local/bin/hadoop/hadoop
vim etc/hadoop/core-site.xml

配置 core-site.xml 的 configuration 的节点内容为:

<configuration>
  <!--hadoop存储目录-->
  <property>
    <name>hadoop.tmp.dir</name>
    <value>file:/usr/local/bin/hadoop/tmp</value>
    <description>Abase for other temporary directories.</description>
  </property>
  <!--指定namenode地址-->
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

修改 hdfs-site 配置文件

cd /usr/local/bin/hadoop/hadoop
vim etc/hadoop/hdfs-site.xml

配置 hdfs-site.xml 的 configuration 的节点内容为:

<configuration>
  <!--hdfs保存副本分数-->
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <!--hdfs namenode存储位置-->
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/usr/local/bin/hadoop/tmp/dfs/name</value>
  </property>
  <!--hdfs datanode存储位置-->
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/usr/local/bin/hadoop/tmp/dfs/data</value>
  </property>
  <!--web页面启动端口以及允许任意IP访问-->
    <property>
        <name>dfs.http.address</name>
        <value>0.0.0.0:9870</value>
    </property>
</configuration>

修改 hadoop-env 配置文件

cd /usr/local/bin/hadoop/hadoop
vim etc/hadoop/hadoop-env.sh

配置 hadoop-env.sh 的 java 环境变量为当前设置的环境变量

export JAVA_HOME=/usr/local/bin/java/jdk

执行 NameNode 的格式化:

bash hdfs namenode -format 输出

2020-11-16 22:13:09,680 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2020-11-16 22:13:09,680 INFO util.GSet: VM type       = 64-bit
2020-11-16 22:13:09,680 INFO util.GSet: 0.029999999329447746% max memory 1.7 GB = 532.5 KB
2020-11-16 22:13:09,680 INFO util.GSet: capacity      = 2^16 = 65536 entries
2020-11-16 22:13:09,720 INFO namenode.FSImage: Allocated new BlockPoolId: BP-262111950-127.0.0.1-1605535989706
2020-11-16 22:13:09,755 INFO common.Storage: Storage directory /usr/local/bin/hadoop/tmp/dfs/name has been successfully formatted.
2020-11-16 22:13:09,803 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/bin/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2020-11-16 22:13:09,941 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/bin/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 388 bytes saved in 0 seconds .
2020-11-16 22:13:09,980 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2020-11-16 22:13:09,996 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
2020-11-16 22:13:09,996 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/

开启 NameNode 和 DataNode 守护进程

start-dfs.sh

输出

Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [dingpw]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.

缺少用户定义而造成的错误,解决方案如下:

cd /usr/local/bin/hadoop/hadoop
vim sbin/start-dfs.sh
vim sbin/stop-dfs.sh

在 start-dfs.sh 和 stop-dfs.sh 文件中添加如下内容

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

再次启动即可,输出如下

WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [localhost]
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: root@localhost: Permission denied (publickey,password).
Starting datanodes
localhost: root@localhost: Permission denied (publickey,password).
Starting secondary namenodes [dingpw]
dingpw: Warning: Permanently added 'dingpw' (ECDSA) to the list of known hosts.
dingpw: root@dingpw: Permission denied (publickey,password).

解决方法如下

cd ~/.ssh
ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys
ssh localhost

再次启动即可,输出

WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [dingpw]

启动完成后,可以通过命令 jps 来判断是否成功

jps -ml
8228 org.apache.hadoop.hdfs.server.datanode.DataNode
8472 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
9403 sun.tools.jps.Jps -ml
8062 org.apache.hadoop.hdfs.server.namenode.NameNode

访问 Web 管理页面

http://ip:9870

开启 NameNode 和 DataNode 守护进程

stop-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [dingpw]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据

关于作者

秉烛思

暂无简介

文章
评论
622 人气
更多

推荐作者

櫻之舞

文章 0 评论 0

弥枳

文章 0 评论 0

m2429

文章 0 评论 0

野却迷人

文章 0 评论 0

我怀念的。

文章 0 评论 0

    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文