1.下载tar包,在客户端窗口上执行
wget -c http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.7.0.tar.gz
2.解压到app目录下
tar -zxvf hive-1.1.0-cdh5.7.0.tar.gz -C ~/app
3.配置~/.bash_profile
export HIVE_HOME=/home/hadoop/app/hive-1.1.0-cdh5.7.0
export PATH=$HIVE_HOME/bin:$PATH
4.环境配置
代码语言:javascript复制cd hive-1.1.0-cdh5.7.0/conf/
cp hive-env.sh.template hive-env.sh
vi hive-env.sh
代码语言:javascript复制# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
5.安装mysql,yum install mysql-xxx 6.配置hive-site.xml文件
代码语言:javascript复制<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop000:3306/sparksql?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
</configuration>
7.拷贝数据库驱动到hive-1.1.0-cdh5.7.0/lib
目录下
cp /home/hadoop/software/mysql-connector-java-5.1.27-bin.jar
8.启动hive,bin/hive
image.png
9.到mysql数据库里查看sparksql
mysql> show databases;
--------------------
| Database |
--------------------
| information_schema |
| azkaban |
| hive |
| mysql |
| spark |
| sparksql |
| sqoop |
| ss |
| test |
--------------------
9 rows in set (0.00 sec)
mysql> use sparksql
Database changed
mysql> show tables;
---------------------------
| Tables_in_sparksql |
---------------------------
| BUCKETING_COLS |
| CDS |
| COLUMNS_V2 |
| DATABASE_PARAMS |
| DBS |
| FUNCS |
| FUNC_RU |
| GLOBAL_PRIVS |
| PARTITIONS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PART_COL_STATS |
| ROLES |
| SDS |
| SEQUENCE_TABLE |
| SERDES |
| SKEWED_COL_NAMES |
| SKEWED_COL_VALUE_LOC_MAP |
| SKEWED_STRING_LIST |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES |
| SORT_COLS |
| TAB_COL_STATS |
| TBLS |
| VERSION |
---------------------------
25 rows in set (0.00 sec)
10.在Hive中建表,
create table hive_count(context string );
报错:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)
解决:
这是由于字符集的问题,需要配置MySQL的字符集: mysql> alter database hive character set latin1;
11.建好表以后,查看sparksql
里的TBLS
和COLUMNS_V2
image.png
12.查看hive官方文档
代码语言:javascript复制https://cwiki.apache.org/confluence/display/Hive/LanguageManual DML#LanguageManualDML-Loadingfilesintotables 从外部文件加载数据
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
13.在/home/hadoop/data/data.log
下面有外部数据,把它导入到hive表里
LOAD DATA LOCAL INPATH '/home/hadoop/data/data.log' INTO TABLE hive_count;
14.select
下结果:
image.png
15.hive ql提交执行后生成MR作业,并在yarn上运行。
代码语言:javascript复制select word,count(1) from hive_count lateral view explode(split(context,'t')) wc as word group by word;
lateral view explode(split(context,'t')) :把每行记录按照指定分割符进行分解。
16.确认结果
image.png
yarn页面,
代码语言:javascript复制http://hostname:8088/cluster/apps
image.png