hive-3.1.2安装以及使用tez作为执行引擎指南

2022-01-19 08:28:20 浏览数 (1)

hive是构建于hadoop之上的、基于SQL的分布式关系型数据库。

为了成功安装好hive,首先确保

hdfs集群安装(单namenode和HA模式)

hadoop yarn安装

centos7中mysql5.7的安装、授权与压测

Apache Tez编译安装与验证

已经安装

安装包下载与解压

代码语言:javascript复制
cd /data

wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

tar zxvf apache-hive-3.1.2-bin.tar.gz

ln -s /data/apache-hive-3.1.2-bin /data/hiveCopy

配置文件修改

1 修改/etc/profile

代码语言:javascript复制
vi /etc/profile

# 新增以下内容
export HIVE_HOME=/data/hive
export PATH=$PATH:$HIVE_HOME/bin

# 刷新配置环境
source /etc/profileCopy

2 查看hive版本

代码语言:javascript复制
hive --version

Hive 3.1.2
Git git://HW13934/Users/gates/tmp/hive-branch-3.1/hive -r 8190d2be7b7165effa62bd21b7d60ef81fb0e4af
Compiled by gates on Thu Aug 22 15:01:18 PDT 2019
From source with checksum 0492c08f784b188c349f6afb1d8d9847
Copy

3 复制hive-default.xml.template,得到一份hive-site.xml

代码语言:javascript复制
cp hive-default.xml.template hive-site.xmlCopy

4 复制hive-env.sh.template,得到一份hive-env.sh

代码语言:javascript复制
cp hive-env.sh.template hive-env.shCopy

在hive-env.sh填入如下内容

代码语言:javascript复制
JAVA_HOME=/data/jdk8
HADOOP_HOME=/data/hadoop
HIVE_HOME=/data/hive

export TEZ_CONF_DIR=/data/tez/conf
export TEZ_JARS=/data/tez/*:/data/tez/lib/*
export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH
Copy
  1. hive需要使用关系型数据库来存储元数据,默认使用derby,这边使用mysql,如果你没有安装mysql可参考文章进行安装,同时授权hadoop1和hadoop2节点可以访问mysql

接下来修改hive-site.xml

新建文件夹:

代码语言:javascript复制
mkdir -p /data/hive/logs
修改权限为777
chmod -R 777 /data/hive/logsCopy

5.1 配置mysql元数据库

这边的mysql数据库地址为:

hostname: hadoop2

username: root

password:

代码语言:javascript复制
# 修改以下几个配置项
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://hadoop2:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>Pass-123-root</value>
    <description>password to use against metastore database</description>
  </property>

  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/data/hive/logs/hive/operation_logs</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  </property>

    <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/data/hive/logs/hive</value>
    <description>Local scratch space for Hive jobs</description>
  </property>

  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/data/hive/logs/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>

    <property>
    <name>hive.querylog.location</name>
    <value>/data/hive/logs/hive</value>
    <description>Location of Hive run time structured log file</description>
  </property>
Copy

5.2 修改执行引擎为tez

代码语言:javascript复制
  <property>
    <name>hive.execution.engine</name>
    <value>tez</value>
    <description>
      Expects one of [mr, tez, spark].
      Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
      remains the default engine for historical reasons, it is itself a historical engine
      and is deprecated in Hive 2 line. It may be removed without further warning.
    </description>
  </property>Copy

6 下载mysql-jdbc到hive/lib目录下

代码语言:javascript复制
cd /data/hive/lib && wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jarCopy

初始化元数据

代码语言:javascript复制
schematool -dbType mysql -initSchemaCopy

在进行元数据初始化过程中,可能会有如下报错,针对这个问题,只要将报错信息中的对应行删除即可(注意rol, col, system-id对应的值)。

代码语言:javascript复制
2021-08-12 16:15:58,896 INFO  [main] conf.HiveConf (HiveConf.java:findConfigFile(187)) - Found configuration file file:/data/apache-hive-3.1.2-bin/conf/hive-site.xml
2021-08-12 16:15:59,118 ERROR [main] conf.Configuration (Configuration.java:loadResource(2980)) - error parsing conf file:/data/apache-hive-3.1.2-bin/conf/hive-site.xml
org.apache.hadoop.shaded.com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x8
 at [row,col,system-id]: [3215,96,"file:/data/apache-hive-3.1.2-bin/conf/hive-site.xml"]Copy

修改hadoop相关配置

修改hadoop中的core-site.xml,新增配置

代码语言:javascript复制
    <property>
      <name>hadoop.proxyuser.hive.groups</name>
      <value>*</value>
    </property>

    <property>
      <name>hadoop.proxyuser.hive.hosts</name>
      <value>hadoop2</value>
    </property>Copy

重启hdfs、yarn

hadoop2节点执行

代码语言:javascript复制
hdfs --daemon stop namenode
hdfs --daemon start namenode
hdfs --daemon stop datanode
hdfs --daemon start datanode

yarn --daemon stop resourcemanager
yarn --daemon start resourcemanager
yarn --daemon stop nodemanager
yarn --daemon start nodemanager
Copy

hadoop1节点执行

代码语言:javascript复制
hdfs --daemon stop namenode
hdfs --daemon start namenode
hdfs --daemon stop datanode
hdfs --daemon start datanode

yarn --daemon stop nodemanager
yarn --daemon start nodemanagerCopy

修改hdfs上新建/user/hive目录并修改/user/hive的目录权限

代码语言:javascript复制
useradd hive
hdfs dfs -mkdir /user/hive
hdfs dfs -chown -R hive:supergroup /user/hiveCopy

启动hive metastore和hiveserver2

11 切换到hive用户,后台启动hive metastore和hiveserver2

代码语言:javascript复制
su hive

nohup hive --service metastore > /data/hive/logs/hive-metastore.log 2>&1 &

nohup hive --service hiveserver2 > /data/hive/logs/hiveserver2.log 2>&1 &
Copy

12 使用beeline连接hiveserver2

代码语言:javascript复制
[hive@hadoop2 logs]$ beeline
Beeline version 3.1.2 by Apache Hive
beeline> !connect jdbc:hive2://hadoop2:10000/default
Connecting to jdbc:hive2://hadoop2:10000/default
Enter username for jdbc:hive2://hadoop2:10000/default: hive
Enter password for jdbc:hive2://hadoop2:10000/default: ****
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoop2:10000/default> Copy

hive基本功能测试

代码语言:javascript复制
create database test;

use test;

create table test(a string);

insert into test values("tom");

select * from test group by a;
Copy

本文为从大数据到人工智能博主「xiaozhch5」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。

原文链接:https://cloud.tencent.com/developer/article/1936516

0 人点赞