Apache Hadoop2.2.0作为新一代hadoop版本,突破原来hadoop1.x的集群机器最多4000台的限制,并有效解决以前常遇到的OOM(内存溢出)问题,其创新的计算框架YARN被称为hadoop的操作系统,不仅兼容原有的mapreduce计算模型而且还可支持其他并行计算模型。
假设我们要搭建2个节点的hadoop2.2.0的集群。一个节点主机名为master,作为集群master兼slave角色运行namenode, datanode, secondarynamenode,resourcemanager和node manager 等daemon进程;另一个节点名为slave1作为集群slave角色运行datanode 和nodemanager进程.
1. 获取hadoop二进制包或者源码包: http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/ , 使用 hadoop-2.2.0.tar.gz 或者 hadoop-2.2.0-src.tar.gz
2. 在每台机器上建立同名用户, 比如hduser; 并安装java (1.6 or 1.7)
解压软件包,比如到目录 /home/hduser/hadoop-2.2.0
如果要编译源代码,请参考以下3,4,5步骤
----------------for compile source file-----------------------
3. 下载 protocbuf2.5.0 : https://code.google.com/p/protobuf/downloads/list, 下载最新的 maven : http://maven.apache.org/download.cgi
编译protocbuf 2.5.0:
- tar -xvf protobuf-2.5.0.tar.gz
- cd protobuf-2.5.0
- ./configure --prefix=/opt/protoc/
- make && make install
4. 安装必须的软件包
如果是rmp linux:
- yum install gcc
- yum intall gcc-c
- yum install make
- yum install cmake
- yum install openssl-devel
- yum install ncurses-devel
如果是Debian linux:
- sudo apt-get install gcc
- sudo apt-get install intall g
- sudo apt-get install make
4. sudo apt-get install cmake 5. sudo apt-get install libssl-dev 6. sudo apt-get install libncurses5-dev
5.开始编译hadoop-2.2.0源码:
mvn clean install –DskipTests
mvn package -Pdist,native -DskipTests -Dtar
6 如果你已经得到了编译好的包(比如hadoop-2.2.0.tar.gz),以下为安装配置过程。
用hduser登录到master机器:
6.1 安装ssh
For example on Ubuntu Linux:
$ sudo apt-get install ssh $ sudo apt-get install rsync
Now check that you can ssh to the localhost without a passphrase: $ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands: $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
then can ssh from master to slaves: scp ~/.ssh/authorized_keys slave1:/home/hduser/.ssh/
6.2 设置 JAVA_HOME in hadoop-env.sh and yarn-env.sh in hadoop_home/etc/hadoop
6.3 编辑 core-site.xml, hdfs-site.xml, mapred-site.xml,yarn-site.xml in hadoop_home/etc/hadoop
A sample core-site.xml:
<!-- Put site-specific property overrides in this file. -->
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hduser/temp</value> </property> </configuration>
A sample hdfs-site.xml :
<!-- Put site-specific property overrides in this file. -->
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/hduser/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hduser/dfs/data</value> </property>
</configuration>
A sample mapred-site.xml :
<!-- Put site-specific property overrides in this file. --> <configuration>
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/home/hduser/temp/hadoop-yarn/staging</value> </property>
</configuration>
A sample yarn-site.xml :
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
<property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property>
<property> <description>CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries</description> <name>yarn.application.classpath</name> <value> hadoop_home/etc/hadoop, hadoop_home/share/hadoop/common/*, hadoop_home/share/hadoop/common/lib/*, hadoop_home/share/hadoop/hdfs/*, hadoop_home/share/hadoop/hdfs/lib/*, hadoop_home/share/hadoop/mapreduce/*, hadoop_home/share/hadoop/mapreduce/lib/*, hadoop_home/share/hadoop/yarn/*, hadoop_home/share/hadoop/yarn/lib/* </value> </property>
</configuration>
6.4 编辑 slaves file in hadoop_home/etc/hadoop ,使其具有以下内容
master
slave1
以上完成后,在master机器以hduser用户使用scp命令拷贝hadoop-2.2.0目录及内容到其他机器的同样路径:
scp hadoop folder 到各个机器 : scp /home/hduser/hadoop-2.2.0 slave1:/home/hduser/hadoop-2.2.0
7. 格式化hdfs (一般只进行一次,除非hdfs故障 ), 依次执行以下命令
- cd /hduser/hadoop-2.2.0/bin/
- ./hdfs namenode -format
8 启动、停止hadoop集群(可多次进行, 一般启动后不停否则Application运行信息会丢失)
- [hadoop@master bin]$ cd ../sbin/
- [hadoop@master sbin]$ ./start-all.sh
9.验证:
hdfs WEB界面 : http://master:50070
RM(ResourceManager)界面: http://master:8088