作者:刘元强
数据备份
1.1备份HDFS数据
常见的备份HDFS数据有如下办法:
1.使用distcp将数据拷贝到另外一个Hadoop集群。
2.将数据拷贝到其他存储设备。
3.将数据分批导出到各台主机的各个磁盘上
以上三种方法也可以只使用于关键数据,具体使用哪种方法,可以根据自己集群的规模和数据量大小具体选择。
1.2备份NameNode元数据
1.登录到Active NameNode节点,将HDFS进入安全模式,并且将所有edits修改都flush到fsimage
代码语言:javascript复制sudo -u hdfs hdfs dfsadmin -safemode enter
sudo -u hdfs hdfs dfsadmin –saveNamespace
2.将NameNode元数据进行备份,根据自己集群NameNode目录进行如下操作:
代码语言:javascript复制mkdir namenode_back
cd namenode_back/
tar czvf nn_bak.tar.gz /dfs/nn/*
1.3备份MySQL元数据
代码语言:javascript复制mkdir mysql_back
cd mysql_back/
#-u后面是mysql用户名,-p单引号中是用户对应的密码,metastore为库名,metastore.sql为备份输出文件
mysqldump -uroot -p'Password&123' hive > hive.sql
mysqldump -uroot -p'Password&123' cm > cm.sql
mysqldump -uroot -p'Password&123' rman > rman.sql
mysqldump -uroot -p'Password&123' hue > hue.sql
mysqldump -uroot -p'Password&123' ranger > ranger.sql
注:如果有Ranger数据库可以同样备份。
1.4 备份集群配置数据
通过Cloudera Manager提供的API接口,导出一份JSON文件,该文件包含Cloudera Manager所有与部署相关的所有信息如:所有主机,集群,服务,角色,用户,设置等等。可以通过这份JSON文件备份或恢复Cloudera Manager的整个部署。
备份集群配置数据,登录到Cloudera Manager所在服务器,运行如下命令:
代码语言:javascript复制curl -u admin:admin "http://192.168.0.159:7180/api/v31/cm/deployment" > ./cm-deployment.json
ll cm-deployment.json
admin: 登录到Cloudera Manager的用户名 admin: 对应admin_username用户的密码 192.168.0.159: 是Cloudera Manager服务器的主机IP ./cm-deployment.json: 保存配置文件的路径和文件名 将上述提到的四个参数修改当前集群对应的信息即可
1.5记录用户数据目录
在后面的章节正式开始卸载时,各个组件的用户数据目录会删除。主要包括如/var/lib/flume-ng /var/lib/hadoop* /var/lib/hue /var/lib/navigator /var/lib/oozie /var/lib/solr /var/lib/sqoop* /var/lib/zookeeper data_drive_path/dfs data_drive_path/mapred data_drive_path/yarn,默认配置是在这些路径下。但是有些时候,你可能通过Cloudera Manager重新进行了配置。如果卸载集群时需要完全删除这些数据目录,或者为了保证你卸载后马上重新安装能成功,一旦你进行了个性化配置,你需要在Cloudera Manager中仔细检查这些目录配置并记录。
删除集群
2.1停止集群服务
1.停止Cluster
在Cloudera Manager主页上选择Cluster1菜单“操作->停止”选项
在弹出的对话框中选择停止。
等待集群服务停止完成
2.停止Cloudera Management Service
选择Coudera Management Server菜单的停止选项
选择停止
Cloudera Management Server停止完成
CM主页显示如下
2.2解除并删除Parcels
1.停用Parcels
在 Cloudera Manager 主页,点击左侧的Parcel 图标
在 parcel 页面,点击右方停用按钮
选择仅限停用状态,确定
此时右方按钮变为“激活”
2.删除Parcels
点击“激活”下方菜单,选择“从主机中删除”
确认删除
完成后按钮变为“分配”
点击下方菜单选择“删除”
删除成功后按钮变为“下载”
2.3 删除集群
进入Cloudera Manager主页,点击Cluster 1右方菜单,选择“删除”
确认删除
删除成功后主页显示如下
软件卸载与目录删除
3.1 停止并卸载cloudera-scm-server
1.在CM节点使用命令停止cloudera-scm-server停止服务
代码语言:javascript复制systemctl stop cloudera-scm-server
systemctl status cloudera-scm-server | grep Active
2.删除cloudera-scm-server服务
代码语言:javascript复制yum -y remove cloudera-manager-server
3.2 停止并卸载cloudera-scm-agent
1.使用脚本批量停止所有节点的cloudera-scm-agent服务
代码语言:javascript复制sh batch_cmd.sh node.list "systemctl stop cloudera-scm-supervisord"
sh batch_cmd.sh node.list "systemctl stop cloudera-scm-agent"
使用脚本执行命令,查看所有节点cloudera-scm-agent服务均已被停止
代码语言:javascript复制sh batch_cmd.sh node.list "systemctl status cloudera-scm-agent | grep Active"
所有节点查看supervisord服务也已被停止
2.所有节点卸载cloudera-manager-agent
代码语言:javascript复制yum -y remove 'cloudera-manager-*'
3.3 卸载集群软件
1.卸载所有节点上的软件
代码语言:javascript复制yum -y remove avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hadoop-kms hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-core spark-master spark-worker spark-history-server spark-python sqoop sqoop2 whirr hue-common oozie-client solr solr-doc sqoop2-client zookeeper
2.清除yum缓存
代码语言:javascript复制yum -y remove avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hadoop-kms hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-core spark-master spark-worker spark-history-server spark-python sqoop sqoop2 whirr hue-common oozie-client solr solr-doc sqoop2-client zookeeper
删除Cloudera Manager和用户数据
4.1 删除Cloudera Manager数据
1.解除挂载cm_processes
代码语言:javascript复制sh batch_cmd.sh node.list "umount cm_processes"
sh batch_cmd.sh node.list "df -hl"
2.删除所有节点的Cloudera Manager数据
代码语言:javascript复制sh batch_cmd.sh node.list "umount cm_processes"
sh batch_cmd.sh node.list "df -hl"
3.删除所有节点的.scm_prepare_node.lock文件
代码语言:javascript复制sh batch_cmd.sh node.list "rm -rf /tmp/.scm_prepare_node.lock"
4.2 移除用户数据(所有节点)
1./etc目录下的集群服务配置文件
代码语言:javascript复制sh batch_cmd.sh node.list "rm -rf /etc/cloudera* /etc/flume-ng /etc/hadoop* /etc/hbase* /etc/hive* /etc/hue /etc/impala /etc/kafka /etc/kudu /etc/ranger /etc/sentry /etc/solr /etc/spark /etc/sqoop /etc/tez /etc/zeppelin /etc/zookeeper"
代码语言:javascript复制sh batch_cmd.sh node.list "rm -rf /etc/alternatives/avro-tools /etc/alternatives/beeline /etc/alternatives/bigtop-detect-javahome /etc/alternatives/catalogd /etc/alternatives/cli_mt /etc/alternatives/cli_st /etc/alternatives/flume* /etc/alternatives/hadoop* /etc/alternatives/hbase* /etc/alternatives/hcat /etc/alternatives/hdfs /etc/alternatives/hive* /etc/alternatives/hiveserver2 /etc/alternatives/hue-conf /etc/alternatives/impala* /etc/alternatives/impalad /etc/alternatives/kafka* /etc/alternatives/kudu* /etc/alternatives/load_gen /etc/alternatives/mapred /etc/alternatives/oozie /etc/alternatives/ozone /etc/alternatives/parquet-tools /etc/alternatives/phoenix* /etc/alternatives/pyspark /etc/alternatives/sentry* /etc/alternatives/solr* /etc/alternatives/solrctl /etc/alternatives/spark* /etc/alternatives/sqoop* /etc/alternatives/statestored /etc/alternatives/tez-conf /etc/alternatives/yarn /etc/alternatives/zeppelin-conf /etc/alternatives/zookeeper*"
2./usr/bin/目录下各项服务的可执行程序命令脚本
代码语言:javascript复制sh batch_cmd.sh node.list "rm -rf /usr/bin/avro-tools /usr/bin/beeline /usr/bin/bigtop-detect-javahome /usr/bin/catalogd /usr/bin/cli_mt /usr/bin/cli_st /usr/bin/flume-ng /usr/bin/hadoop* /usr/bin/hbase* /usr/bin/hcat /usr/bin/hdfs /usr/bin/hive /usr/bin/hiveserver2 /usr/bin/impala* /usr/bin/impalad /usr/bin/kafka* /usr/bin/kudu* /usr/bin/load_gen /usr/bin/mapred /usr/bin/oozie /usr/bin/ozone /usr/bin/parquet-tools /usr/bin/phoenix* /usr/bin/pyspark /usr/bin/sentry /usr/bin/solrctl /usr/bin/spark* /usr/bin/sqoop* /usr/bin/statestored /usr/bin/yarn /usr/bin/zookeeper*"
3./var/lib/目录下各项服务数据目录
代码语言:javascript复制sh batch_cmd.sh node.list "rm -rf /var/lib/accumulo /var/lib/atlas /var/lib/cloudera* /var/lib/druid /var/lib/flink /var/lib/flume-ng /var/lib/hadoop* /var/lib/hbase /var/lib/hive /var/lib/hue /var/lib/impala /var/lib/kafka /var/lib/knox /var/lib/kudu /var/lib/livy /var/lib/llama /var/lib/oozie /var/lib/phoenix /var/lib/ranger /var/lib/solr /var/lib/spark /var/lib/sqoop* /var/lib/superset /var/lib/yarn-ce /var/lib/zeppelin /var/lib/zookeeper"
代码语言:javascript复制sh batch_cmd.sh node.list "rm -rf /var/lib/alternatives/avro-tools /var/lib/alternatives/beeline /var/lib/alternatives/bigtop-detect-javahome /var/lib/alternatives/catalogd /var/lib/alternatives/cli_mt /var/lib/alternatives/cli_st /var/lib/alternatives/flume* /var/lib/alternatives/hadoop* /var/lib/alternatives/hbase* /var/lib/alternatives/hcat /var/lib/alternatives/hdfs /var/lib/alternatives/hive* /var/lib/alternatives/hue-conf /var/lib/alternatives/impala* /var/lib/alternatives/impalad /var/lib/alternatives/kafka* /var/lib/alternatives/kudu* /var/lib/alternatives/load_gen /var/lib/alternatives/mapred /var/lib/alternatives/oozie /var/lib/alternatives/ozone /var/lib/alternatives/parquet-tools /var/lib/alternatives/phoenix* /var/lib/alternatives/pyspark /var/lib/alternatives/sentry* /var/lib/alternatives/solr* /var/lib/alternatives/solrctl /var/lib/alternatives/spark* /var/lib/alternatives/sqoop* /var/lib/alternatives/statestored /var/lib/alternatives/tez-conf /var/lib/alternatives/yarn /var/lib/alternatives/zeppelin-conf /var/lib/alternatives/zookeeper*"
4./var/run/目录下的各项服务数据目录
代码语言:javascript复制sh batch_cmd.sh node.list "rm -rf /var/run/cloudera* /var/run/flume-ng /var/run/hadoop* /var/run/hbase /var/run/hdfs-sockets /var/run/hive /var/run/hue /var/run/impala /var/run/oozie /var/run/sqoop2 /var/run/zookeeper"
5./var/log/目录下的日志文件
代码语言:javascript复制sh batch_cmd.sh node.list "rm -rf /var/log/atlas /var/log/catalogd /var/log/cloudera* /var/log/hadoop* /var/log/hbase /var/log/hive /var/log/hue* /var/log/impalad /var/log/impala* /var/log/kafka /var/log/oozie /var/log/phoenix /var/log/ranger /var/log/spark /var/log/statestore /var/log/yarn /var/log/zookeeper /var/local/kafka"
6./tmp/目录下的临时文件
代码语言:javascript复制sh batch_cmd.sh node.list "rm -rf /tmp/*_resources /tmp/cmflistener* /tmp/ehcache* /tmp/embedded /tmp/hadoop* /tmp/hbase* /tmp/hive* /tmp/hsperfdata* /tmp/jetty* /tmp/oozie /tmp/scm_prepare_node* /tmp/start_* /tmp/tmp*"
4.3 删除安装目录
1.删除/etc/yum.repos.d/cloudera*
代码语言:javascript复制 sh batch_cmd.sh node.list "rm -rf /etc/yum.repos.d/cloudera*"
2.删除nn,dn,jn,yarn,impala,kudu等数据目录
代码语言:javascript复制sh batch_cmd.sh node.list "rm -rf /dfs/* /data0/* /data1/* /data/* /impala /yarn /kudu*“
最后根据实际情况操作是否remove元数据库MySQL,至此,CDP的卸载完毕。