这周真的是忙出天际,趁这会儿下班,赶紧补补文档,之前有说要整整血缘这块儿,源码都看好了,但没有展示的地方。
后来调研说atlas不错,就想着用atlas跑一把,看能不能打通,最后经过状况百出的编译,还真是跑通了,借助各种开源组件,atlas能自动感知hivesql及sparksql的表血缘和字段血缘,真的太棒了!!
有这样一套环境,至少对于想研究这块或者想要做这块二次开发的同学来说可太友好,读读atlas,kyuubi源码,再研究下hivesql及sparksql执行计划,就可以开搞了。
先说下环境
操作系统是macos
atlas版本:2.3.0
hadoop版本:3.3.6
mysql版本:5.7.25
hive版本:3.1.3
spark版本:3.3.3
kyuubi版本:1.8.1
其他:
maven版本:3.8.7
jdk版本:1.8.0_201
部署过程
1、Atlas安装
- 下载和编译
官网:https://atlas.apache.org/#/
下载2.3.0版本源码https://dlcdn.apache.org/atlas/2.3.0/apache-atlas-2.3.0-sources.tar.gz
atlas没有提供安装包,需要我们自己编译
编译文档:https://atlas.apache.org/#/BuildInstallation
如果自己的maven和jdk环境没有问题,按照官网上的文档编译就不会有啥问题,官网有几种编译形式,我这儿选择了内嵌Hbase和Solr的形式(这样就不用再单独安装hbase和solr了)
代码语言:javascript复制解压:
tar xvfz apache-atlas-2.3.0-sources.tar.gz
编译:
cd apache-atlas-sources-2.3.0/
mvn clean -DskipTests package -Pdist,embedded-hbase-solr
- 安装atlas
安装文档:https://atlas.apache.org/#/Installation
我们需要安装的tar包在 apache-atlas-sources-2.3.0/distro/target下
apache-atlas-2.3.0-server.tar.gz
代码语言:javascript复制解压
tar -zxvf apache-atlas-2.3.0-server.tar.gz
修改atlas-env.sh
cd apache-atlas-2.3.0
vim conf/atlas-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home
export MANAGE_LOCAL_HBASE=true
export MANAGE_LOCAL_SOLR=true:
配制环境变量
vim /etc/profile
export ATLAS_HOME=/xx/apache-atlas-2.3.0
export PATH=.:$MAVEN_HOME/bin:$JAVA_HOME/bin:$ATLAS_HOME/bin:$ZOOKEEPER_HOME/bin:$PROTOBUF_HOME/bin:$MYSQL_HOME/bin:$SPARK_HOME/sbin:$SPARK_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
使用环境变量起作用
source /etc/profile
- 配制hive-hook
咱这次是跑通hive、sparksql两个hook,来实现血缘关系自动导入atlas,这里先配制hive-hook
hive-hook的tar包在apache-atlas-sources-2.3.0/distro/target下
apache-atlas-2.3.0-hive-hook.tar.gz
代码语言:javascript复制解压
tar -zxvf apache-atlas-2.3.0-hive-hook.tar.gz
复制atlas-hive-hook安装包下的hook文件夹和hook-bin文件夹到 atlas的安装目录下
cp -r apache-atlas-hive-hook-2.3.0/hook apache-atlas-2.3.0
cp -r apache-atlas-hive-hook-2.3.0/hook-bin apache-atlas-2.3.0
修改配制文件atlas-application.properties
cd apache-atlas-2.3.0
vim conf/atlas-application.properties
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
2、Hadoop安装
官网:https://hadoop.apache.org/
下载:hadoop-3.3.6.tar.gz
- 各种配制
解压
tar -zxvf hadoop-3.3.6.tar.gz
配制环境变量
vim /etc/profile
export HADOOP_HOME=/xx/hadoop-3.3.6
export PATH=.:$MAVEN_HOME/bin:$JAVA_HOME/bin:$ATLAS_HOME/bin:$ZOOKEEPER_HOME/bin:$PROTOBUF_HOME/bin:$MYSQL_HOME/bin:$SPARK_HOME/sbin:$SPARK_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
使环境变量起作用
source /etc/profile
修改hadoop配制文件
cd $HADOOP_HOME/etc/hadoop/
vim core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/xx/data/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/xx/data/hadoop/tmp/dfs/name</value> --放临时数据
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/xx/data/hadoop/tmp/dfs/data</value> --放临时数据
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
/xx/soft/hadoop-3.3.6/etc/hadoop, --换成自己路径
/xx/soft/hadoop-3.3.6/share/hadoop/common/lib/*,
/xx/soft/hadoop-3.3.6/share/hadoop/common/*,
/xx/soft/hadoop-3.3.6/share/hadoop/hdfs,
/xx/soft/hadoop-3.3.6/share/hadoop/hdfs/lib/*,
/xx/soft/hadoop-3.3.6/share/hadoop/hdfs/*,
/xx/soft/hadoop-3.3.6/share/hadoop/mapreduce/*,
/xx/soft/hadoop-3.3.6/share/hadoop/yarn,
/xx/soft/hadoop-3.3.6/share/hadoop/yarn/lib/*,
/xx/soft/hadoop-3.3.6/share/hadoop/yarn/*
</value>
</property>
</configuration>
vim yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
<description>default value is 1024</description>
</property>
</configuration>
ssh免密码登录
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys
chmod 0600~/.ssh/authorized_keys
- 启动和停止
格式化namenode
hdfs namenode -format
启动
start-dfs.sh
start-yarn.sh
jps看进程
xx@C02D83S2ML85 hadoop % jps
29808 SecondaryNameNode
29664 DataNode
70835 Jps
30004 ResourceManager
30100 NodeManager
30884 Atlas
32917 Launcher
30549 HMaster
38870 Master
38921 Worker
29565 NameNode
1869
看web界面
http://localhost:9870/dfshealth.html#tab-overview
关闭:
stop-dfs.sh
stop-yarn.sh
其他hadoop命令测试:
hdfs dfs -mkdir /wordcount
hdfs dfs -put ~/testdata/wordcount /wordcount
- jps看进程:
- Web:
3、Mysql安装
官网:https://dev.mysql.com/
下载:https://downloads.mysql.com/archives/community/
下载 mysql-5.7.25-macos10.14-x86_64 .dmg
双击安装,安装后,再重新设置下密码(自动生成的密码太复杂,记不住)
登录测试:
mysql -u root -p123456
4、Hive安装
官网:https://hive.apache.org/index.html
下载安装包apache-hive-3.1.3-bin.tar.gz:https://dlcdn.apache.org/hive/
- 配置
解压
tar -zxvf apache-hive-3.1.3-bin.tar.gz
配制环境变量
vim /etc/profile
export HIVE_HOME=/xx/apache-hive-2.3.9-bin
export PATH=.:$MAVEN_HOME/bin:$JAVA_HOME/bin:$ATLAS_HOME/bin:$ZOOKEEPER_HOME/bin:$PROTOBUF_HOME/bin:$MYSQL_HOME/bin:$SPARK_HOME/sbin:$SPARK_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
使环境变量起作用
source /etc/profile
配制hive-site.xml
cd apache-hive-3.1.3-bin/conf
cp hive-default.xml.template hive-site.xml
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
下载mysql驱动包,mysql-connector-java-5.1.49.jar(没找到5.7.25版本的,用5.1.49也可以跑起来