1. 主机规划
主机名称 | IP地址 | 操作系统 | 部署软件 | 运行进程 | 备注 |
---|---|---|---|---|---|
mini01 | 172.16.1.11【内网】 10.0.0.11 【外网】 | CentOS 7.5 | Jdk-8、zookeeper-3.4.5、Hadoop2.7.6、hbase-2.0.2、kafka_2.11-2.0.0、spark-2.4.0-hadoop2.7【主】 | QuorumPeerMain、 | |
mini02 | 172.16.1.12【内网】 10.0.0.12 【外网】 | CentOS 7.5 | Jdk-8、zookeeper-3.4.5、Hadoop2.7.6、hbase-2.0.2、kafka_2.11-2.0.0、spark-2.4.0-hadoop2.7【主】 | QuorumPeerMain、 | |
mini03 | 172.16.1.13【内网】 10.0.0.13 【外网】 | CentOS 7.5 | Jdk-8、zookeeper-3.4.5、Hadoop2.7.6、hbase-2.0.2、kafka_2.11-2.0.0、spark-2.4.0-hadoop2.7 | QuorumPeerMain、 | |
mini04 | 172.16.1.14【内网】 10.0.0.14 【外网】 | CentOS 7.5 | Jdk-8、zookeeper-3.4.5、Hadoop2.7.6、hbase-2.0.2、spark-2.4.0-hadoop2.7 | QuorumPeerMain、 | |
mini05 | 172.16.1.15【内网】 10.0.0.15 【外网】 | CentOS 7.5 | Jdk-8、zookeeper-3.4.5、Hadoop2.7.6、hbase-2.0.2、spark-2.4.0-hadoop2.7 | QuorumPeerMain、 | |
说明
代码语言:txt复制 借助zookeeper,并且启动至少两个Master节点来实现高可靠。
2. 免密码登录
实现mini01、mini02到mini01、mini02、mini03、mini04、mini05通过秘钥免密码登录。
参见文章:Hadoop2.7.6_01_部署
3. Jdk【java8】
参见文章:Hadoop2.7.6_01_部署
4. Zookeeper部署
参见文章:zookeeper-02 部署
代码语言:txt复制 并启动zookeeper服务
5. Spark部署步骤
5.1. Spark安装
代码语言:javascript复制 1 [yun@mini01 software]$ pwd
2 /app/software
3 [yun@mini01 software]$ ll
4 total 238572
5 -rw-r--r-- 1 yun yun 227893062 Nov 19 21:24 spark-2.4.0-bin-hadoop2.7.tgz
6 [yun@mini01 software]$ tar xf spark-2.4.0-bin-hadoop2.7.tgz
7 [yun@mini01 software]$ mv spark-2.4.0-bin-hadoop2.7 /app/
8 [yun@mini01 software]$ cd /app/
9 [yun@mini01 ~]$ ln -s spark-2.4.0-bin-hadoop2.7/ spark
10 [yun@mini01 ~]$ ll -d spark-*
11 drwxr-xr-x 13 yun yun 211 Oct 29 14:36 spark-2.4.0-bin-hadoop2.7
12 lrwxrwxrwx 1 yun yun 26 Nov 24 14:23 spark -> spark-2.4.0-bin-hadoop2.7/
5.2. 环境变量修改
根据规划,该环境变量的修改包括mini01、mini02、mini03、mini04、mini05。
代码语言:javascript复制1 # 需要root权限去添加环境变量
2 [root@mini01 ~]# tail /etc/profile
3 ………………
4 # spark环境变量
5 export SPARK_HOME="/app/spark"
6 export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
7
8 [root@mini01 ~]# logout
9 [yun@mini01 conf]$ source /etc/profile # 重新加载该环境变量
5.3. 配置修改
代码语言:javascript复制 1 [yun@mini01 conf]$ pwd
2 /app/spark/conf
3 [yun@mini01 conf]$ cp -a spark-env.sh.template spark-env.sh
4 [yun@mini01 conf]$ tail spark-env.sh # 修改环境变量配置
5 # Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
6 # You might get better performance to enable these options if using native BLAS (see SPARK-21305).
7 # - MKL_NUM_THREADS=1 Disable multi-threading of Intel MKL
8 # - OPENBLAS_NUM_THREADS=1 Disable multi-threading of OpenBLAS
9
10 # 添加配置如下
11 # 配置JAVA_HOME
12 export JAVA_HOME=/app/jdk
13 # -Dspark.deploy.recoverMode=ZOOKEEPER #代表发生故障使用zookeeper服务
14 # -Dspark.depoly.zookeeper.url=mini01:2181,mini02:2181,mini03:2181,mini04:2181,mini05:2181 #zookeeper的连接信息
15 # -Dspark.deploy.zookeeper.dir=/app/zookeeper/spark #spark要在zookeeper上写数据时的保存目录
16 export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=mini01:2181,mini02:2181,mini03:2181,mini04:2181,mini05:2181 -Dspark.deploy.zookeeper.dir=/spark"
17 # 每一个Worker最多可以使用的内存,我的虚拟机就2g
18 # 真实服务器如果有128G,你可以设置为100G
19 # 所以这里设置为1024m或1g
20 export SPARK_WORKER_MEMORY=1024m
21 # 每一个Worker最多可以使用的cpu core的个数,我虚拟机就一个...
22 # 真实服务器如果有32个,你可以设置为32个
23 export SPARK_WORKER_CORES=1
24 # 提交Application的端口,默认就是这个,万一要改呢,改这里
25 export SPARK_MASTER_PORT=7077
26
27 [yun@mini01 conf]$ pwd
28 /app/spark /conf
29 [yun@mini01 conf]$ cp -a slaves.template slaves
30 [yun@mini01 conf]$ tail slaves # 修改slaves 配置
31 # distributed under the License is distributed on an "AS IS" BASIS,
32 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
33 # See the License for the specific language governing permissions and
34 # limitations under the License.
35 #
36
37 # A Spark Worker will be started on each of the machines listed below.
38 mini03
39 mini04
40 mini05
配置说明
-Dspark.deploy.zookeeper.dir=/app/zookeeper/spark # spark要在zookeeper上写数据时的保存目录
代码语言:javascript复制1 [yun@mini05 ~]$ zkCli.sh # 进入zookeeper命令行 【在spark启动后查看】
2 [zk: localhost:2181(CONNECTED) 0] ls / # 其中的 /spark 就是 我们在spark-env.sh中的配置
3 [cluster, brokers, zookeeper, yarn-leader-election, hadoop-ha, admin, isr_change_notification, log_dir_event_notification, controller_epoch, spark, consumers, latest_producer_id_block, config, hbase]
4 [zk: localhost:2181(CONNECTED) 1] ls /spark
5 [leader_election, master_status]
6 [zk: localhost:2181(CONNECTED) 2] ls /spark/master_status
7 [worker_worker-20181125113658-172.16.1.13-18433, worker_worker-20181125113658-172.16.1.14-14175, worker_worker-20181125113658-172.16.1.15-8887]
8 [zk: localhost:2181(CONNECTED) 3] ls /spark/leader_election
9 [_c_6c6d0c36-3017-4354-a05c-9414a78d79e2-latch-0000000000, _c_04ceffff-b763-454a-b3f1-7fb56f56fa84-latch-0000000001]
5.4. 分发到其他机器
分发到mini02、mini03、mini04和mini05
其中mini01和mini02作为master
代码语言:javascript复制1 [yun@mini01 ~]$ scp -pr spark-2.4.0-bin-hadoop2.7/ yun@mini02:/app # 拷贝到mini02
2 [yun@mini01 ~]$ scp -pr spark-2.4.0-bin-hadoop2.7/ yun@mini03:/app # 拷贝到mini03
3 [yun@mini01 ~]$ scp -pr spark-2.4.0-bin-hadoop2.7/ yun@mini04:/app # 拷贝到mini04
4 [yun@mini01 ~]$ scp -pr spark-2.4.0-bin-hadoop2.7/ yun@mini05:/app # 拷贝到mini05
在mini02、mini03、mini04和mini05上操作
代码语言:javascript复制1 [yun@mini04 ~]$ pwd
2 /app
3 [yun@mini04 ~]$ ll -d spark-2.4.0-bin-hadoop2.7
4 drwxr-xr-x 13 yun yun 211 Oct 29 14:36 spark-2.4.0-bin-hadoop2.7
5 [yun@mini04 ~]$ ln -s spark-2.4.0-bin-hadoop2.7/ spark
6 [yun@mini04 ~]$ ll -d spark-*
7 drwxr-xr-x 13 yun yun 211 Oct 29 14:36 spark-2.4.0-bin-hadoop2.7
8 lrwxrwxrwx 1 yun yun 26 Nov 24 23:39 spark -> spark-2.4.0-bin-hadoop2.7/
5.5. 启动spark
5.5.1. 在mini01上操作
代码语言:javascript复制 1 [yun@mini01 sbin]$ pwd
2 /app/spark/sbin
3 [yun@mini01 sbin]$ ./start-all.sh # 关闭使用 stop-all.sh 脚本
4 [yun@mini01 sbin]$ ./start-all.sh
5 starting org.apache.spark.deploy.master.Master, logging to /app/spark/logs/spark-yun-org.apache.spark.deploy.master.Master-1-mini01.out
6 mini03: starting org.apache.spark.deploy.worker.Worker, logging to /app/spark/logs/spark-yun-org.apache.spark.deploy.worker.Worker-1-mini03.out
7 mini04: starting org.apache.spark.deploy.worker.Worker, logging to /app/spark/logs/spark-yun-org.apache.spark.deploy.worker.Worker-1-mini04.out
8 mini05: starting org.apache.spark.deploy.worker.Worker, logging to /app/spark/logs/spark-yun-org.apache.spark.deploy.worker.Worker-1-mini05.out
9 [yun@mini01 ~]$
10 [yun@mini01 ~]$ jps # 查看进程状态
11 4033 QuorumPeerMain
12 4683 Jps
13 4575 Master
5.5.2. 在mini02上操作
代码语言:javascript复制1 [yun@mini02 sbin]$ pwd
2 /app/spark/sbin
3 [yun@mini02 sbin]$ ./start-master.sh
4 starting org.apache.spark.deploy.master.Master, logging to /app/spark/logs/spark-yun-org.apache.spark.deploy.master.Master-1-mini02.out
5 [yun@mini02 sbin]$ jps # 查看进程状态
6 2914 Master
7 2999 Jps
8 2313 QuorumPeerMain
5.5.3. mini03进程查看
代码语言:javascript复制1 [yun@mini03 ~]$ jps
2 2824 Jps
3 2558 QuorumPeerMain
4 2766 Worker
5.5.4. mini04进程查看
代码语言:javascript复制1 [yun@mini04 ~]$ jps
2 2931 Jps
3 2824 Worker
4 2555 QuorumPeerMain
5.5.5. mini05进程查看
代码语言:javascript复制1 [yun@mini05 ~]$ jps
2 2806 Jps
3 2747 Worker
4 2527 QuorumPeerMain
5.6. 浏览器访问
代码语言:javascript复制1 http://mini01:8080/
代码语言:javascript复制1 http://mini02:8080/
说明
代码语言:txt复制 如果我们停了mini01的spark master,稍等一会儿可见mini02的master状态从standby变为了alive。
代码语言:txt复制 此时再启动mini01的master,可见mini01的master状态是standby。