hadoop集群搭建

前置工作

1.创建虚拟机

2.配置虚拟机网络

3.WIN10 IP地址配置

4.CentOS静态IP设置

5.克隆三台虚拟机

6.jdk安装

7.hadoop安装

8.SSH免密登录配置（shell脚本单独提供）

集群搭建

1.集群部署规划

192.168.5.102 hadoop102

192.168.5.103 hadoop103

192.168.5.104 hadoop104

2.配置文件说明及配置事项

总共涉及的配置文件有4个：

core-site.xml 、hdfs-site.xml 、yarn-site.xml 、mapred-site.xml

存放路径：$HADOOP_HOME/etc/hadoop

（1）核心配置文件

配置core-site.xml

vim core-site.xml

代码语言：javascript复制

<?xml version="1.0" encoding="UTF-8"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
 
<configuration> 
    <!-- 指定 NameNode 的地址 --> 
    <property> 
        <name>fs.defaultFS</name> 
        <value>hdfs://hadoop102:8020</value> 
    </property> 
 
    <!-- 指定 hadoop 数据的存储目录 --> 
    <property> 
        <name>hadoop.tmp.dir</name> 
        <value>/opt/module/hadoop-3.1.3/data</value> 
    </property> 
 
    <!-- 配置 HDFS 网页登录使用的静态用户为 hadoop --> 
    <property> 
        <name>hadoop.http.staticuser.user</name> 
        <value>hadoop</value> 
    </property> 
</configuration>

(2)hdfs配置文件

配置文件：hdfs-site.xml

vim hdfs-site.xml

代码语言：javascript复制

<?xml version="1.0" encoding="UTF-8"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
 
<configuration> 
  <!-- NameNode web 端访问地址--> 
  <property> 
        <name>dfs.namenode.http-address</name> 
        <value>hadoop102:9870</value> 
    </property> 
  <!-- SecondaryNameNode web 端访问地址--> 
    <property> 
        <name>dfs.namenode.secondary.http-address</name> 
        <value>hadoop104:9868</value> 
    </property> 
</configuration>

(3)YARN配置文件

配置文件：yarn-site.xml

vim yarn-site.xml

代码语言：javascript复制

<?xml version="1.0" encoding="UTF-8"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
<configuration> 
    <!-- 指定 MR 走 shuffle --> 
    <property> 
        <name>yarn.nodemanager.aux-services</name> 
        <value>mapreduce_shuffle</value> 
    </property> 
 
    <!-- 指定 ResourceManager 的地址--> 
    <property> 
        <name>yarn.resourcemanager.hostname</name> 
        <value>hadoop103</value> 
    </property> 
 
    <!-- 环境变量的继承 --> 
    <property> 
        <name>yarn.nodemanager.env-whitelist</name> 
        
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> 
    </property> 
</configuration>

(4)MapReduce配置文件

配置文件： mapred-site.xml

vim mapred-site.xml

代码语言：javascript复制

<?xml version="1.0" encoding="UTF-8"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
 
<configuration> 
  <!-- 指定 MapReduce 程序运行在 Yarn 上 --> 
    <property> 
        <name>mapreduce.framework.name</name> 
        <value>yarn</value> 
    </property> 
</configuration>

3.配置文件分发

通过shell自行封装的命令xsync，把hadoop102上的所有配置文件分到到其他节点的相同路径下（shell脚本单独提供）

命令格式：xsync 文件路径

xsync /opt/module/hadoop-3.1.3/etc/hadoop/

比如：要分到home下的stu.json到所有节点，具体如下：