数据仓库 Hive(内含大数据镜像下载)

2021-09-06 10:35:35 浏览数 (1)

文章目录

    • 1. 特点
    • 2. Hive 体系架构
    • 3. 安装 Hive
      • 3.1 安装 MySQL
      • 3.2 安装 Hive
      • 3.3 配置 Hive
    • 4. 实践
      • 4.1 函数
      • 4.2 Hive Shell

安装没成功:直接用现成的镜像

大数据Linux实验环境虚拟机镜像文件 http://dblab.xmu.edu.cn/blog/1645-2/

cloudera-quickstart-vm-集成了大数据平台的虚拟机镜像

1. 特点

  • 查询语言与 SQL 接近
  • 并行执行
  • 使用 HDFS 存储
  • 支持多种数据格式
  • 不支持数据更新
  • 不支持索引
  • 执行延迟高(不适合在线数据查询)
  • 可扩展性高
  • 数据规模大

2. Hive 体系架构

3. 安装 Hive

  • 先安装 hadoop:hadoop 多机全分布式安装步骤(虚拟机1master 2slave)

3.1 安装 MySQL

  • 在 master 上安装
代码语言:javascript复制
yum localinstall https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm
或者
wget https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm
rpm -ivh mysql57-community-release-el7-11.noarch.rpm
  • 查看可选版本
代码语言:javascript复制
yum repolist all | grep mysql
代码语言:javascript复制
yum install -y mysql-community-server
  • 启动 MySQL
代码语言:javascript复制
[root@master opt]# systemctl start mysqld
[root@master opt]# systemctl status mysqld
● mysqld.service - MySQL Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: active (running) since 一 2021-03-29 06:25:16 CST; 22h ago
     Docs: man:mysqld(8)
           http://dev.mysql.com/doc/refman/en/using-systemd.html
  Process: 49287 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
  Process: 49266 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
 Main PID: 49291 (mysqld)
    Tasks: 28
   CGroup: /system.slice/mysqld.service
           └─49291 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/my...

3月 29 06:25:13 master systemd[1]: Starting MySQL Server...
3月 29 06:25:16 master systemd[1]: Started MySQL Server.
  • 测试 MySQL
代码语言:javascript复制
mysql -u root -p

enter 空密码,报错

报错参考https://blog.csdn.net/vv19910825/article/details/82979563

(操作:vim /etc/my.cnf, 添加 skip-grant-tables跳过密码 systemctl restart mysqld,下面改过密码了,在删除)

修改密码,注意满足密码强度

代码语言:javascript复制
mysql> flush privileges; 
	-> alter user 'root'@'localhost' identified by '!@#123qwe';
    -> flush privileges;

vim /etc/my.cnf, 删除 skip-grant-tables

systemctl restart mysqld

mysql -u root -p 使用新密码,再次登录

代码语言:javascript复制
mysql> show databases;
 -------------------- 
| Database           |
 -------------------- 
| information_schema |
| mysql              |
| performance_schema |
| sys                |
 -------------------- 
4 rows in set (0.01 sec)
  • 在 MySQL 中创建 Hive 所需用户和数据库,并授权
代码语言:javascript复制
mysql> create user 'dnn' identified by '!@#123Qwe';
mysql> create database hive;
Query OK, 1 row affected (0.02 sec)
mysql> grant all privileges on hive.* to 'dnn'@'localhost' identified by '!@#123Qwe';
mysql> flush privileges;

hive 表创建成功

代码语言:javascript复制
mysql> show databases;
 -------------------- 
| Database           |
 -------------------- 
| information_schema |
| hive               |
| mysql              |
| performance_schema |
| sys                |
 -------------------- 
5 rows in set (0.00 sec)

3.2 安装 Hive

下载地址 : https://hive.apache.org/downloads.html

我下载的是 apache-hive-3.1.2-bin.tar.gz

代码语言:javascript复制
tar -zxvf apache-hive-3.1.2-bin.tar.gz 
  • 赋权限给普通用户 dnn
代码语言:javascript复制
[root@master opt]# chown -R dnn /opt/hive3.1.2/
  • 复制 MySQL 的 JDBC 驱动包到 Hive 的 /lib 下
代码语言:javascript复制
下载地址 https://dev.mysql.com/downloads/connector/j/

解压

代码语言:javascript复制
tar -zxvf mysql-connector-java-5.1.49.tar.gz 

移动

代码语言:javascript复制
[dnn@master Downloads]$ mv mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar /opt/hive3.1.2/lib
[dnn@master Downloads]$ rm -rf mysql-connector-java-5.1.49

3.3 配置 Hive

  • 配置 hive-env.sh
代码语言:javascript复制
[dnn@master opt]$ cd /opt/hive3.1.2/conf/
[dnn@master conf]$ cp hive-env.sh.template  hive-env.sh
[dnn@master conf]$ vim hive-env.sh

第48行改为自己的路径 HADOOP_HOME=/opt/hadoop-3.3.0

第51行 export HIVE_CONF_DIR=/opt/hive3.1.2/conf

第54行 export HIVE_AUX_JARS_PATH=/opt/hive3.1.2/lib

  • 配置 hive-default.xml,直接复制模板
代码语言:javascript复制
cp hive-default.xml.template hive-default.xml
  • 配置 vim hive-site.xml
代码语言:javascript复制
<configuration>

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
ql.jdbc.Driver     To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore,com.mysql.jdbc.Driver is depricated</description>
  </property>

<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>dnn</value>
    <description>Username to use against metastore database</description>
  </property>

<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>!@#123Qwe</value>
    <description>password to use against metastore database</description>
  </property>
</configuration>
  • 初始化 Hive Metastore 在 hive 的 bin 目录下 ./schematool -initSchema -dbType mysql

配置环境变量

代码语言:javascript复制
vim ~/.bashrc

添加

代码语言:javascript复制
export HIVE_HOME=/opt/hive3.1.2
export PATH=${HIVE_HOME}/bin:$PATH

使之生效

代码语言:javascript复制
source ~/.bashrc 

4. 实践

启动 hadoop 命令:

代码语言:javascript复制
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
# 第三条可以用下面的命令,上面的显示过期了,以后弃用
mapred --daemon start historyserver

4.1 函数

显示函数 show functions;

代码语言:javascript复制
hive> show functions;
OK
!
!=
$sum0
%
(省略。。。)

显示函数帮助信息:describe function upper(函数名);

代码语言:javascript复制
hive> 
    > describe function upper;
OK
upper(str) - Returns str with all characters changed to uppercase
Time taken: 0.007 seconds, Fetched: 1 row(s)
  • 自定义函数,重写 UDF 中的 evaluate()
代码语言:javascript复制
package michael_package;

import org.apache.hadoop.hive.ql.exec.UDF;

public class Sub extends UDF{
	public Integer evaluate(Integer a, Integer b) {
		if(a==null || b==null)
			return null;
		return a-b;
	}
	public Double evaluate(Double a, Double b) {
		if(a==null || b==null)
			return null;
		return a-b;
	}
	public Integer evaluate(Integer a, Integer[] b) {
		if(a==null || b==null)
			return null;
		int ans = a;
		for(int i = 0; i < b.length;   i) {
			if(b[i] != null)
				ans -= b[i];
		}
		return ans;
	}
}
  • 表生成自定义函数 UDTF
  • 聚集自定义函数 UDAF

4.2 Hive Shell

  • create tableshow tablesdescribe 表name
代码语言:javascript复制
hive> create table student(
    > id string,
    > name string,
    > sex string,
    > age tinyint,
    > dept string)
    > row format delimited fields terminated by 't';
OK
Time taken: 0.464 seconds
hive> show tables;
OK
student
Time taken: 0.085 seconds, Fetched: 1 row(s)
hive> describe student;
OK
id                  	string              	                    
name                	string              	                    
sex                 	string              	                    
age                 	tinyint             	                    
dept                	string              	                    
Time taken: 0.069 seconds, Fetched: 5 row(s)
  • 编写个数据文件
代码语言:javascript复制
hadoop@dblab-VirtualBox:/usr/local/eclipse$ vim /home/hadoop/workspace/student.txt
hadoop@dblab-VirtualBox:/usr/local/eclipse$ cat /home/hadoop/workspace/student.txt
1	michael	male	18	bigdata
2	ming	male	19	AI
3	lili	female	18	math
4	huahua	female	20	AI
  • 加载数据到表格
代码语言:javascript复制
hive> load data local inpath '/home/hadoop/workspace/student.txt' into table student;
Loading data to table default.student
OK
Time taken: 1.158 seconds
  • select 操作表
代码语言:javascript复制
hive> select * from student;
OK
1	michael	male	18	bigdata
2	ming	male	19	AI
3	lili	female	18	math
4	huahua	female	20	AI
Time taken: 0.789 seconds, Fetched: 4 row(s)
代码语言:javascript复制
hive> select * from student where sex='male';
OK
1	michael	male	18	bigdata
2	ming	male	19	AI
Time taken: 1.565 seconds, Fetched: 2 row(s)

HiveQL 将命令转换为 MapReduce 操作

代码语言:javascript复制
hive> select sex, count(*) from student group by sex;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20210331231151_0e76f173-ac54-46e0-8886-f018f8a7f480
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2021-03-31 23:11:52,943 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local236806530_0001
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 352 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
female	2
male	2
Time taken: 1.55 seconds, Fetched: 2 row(s)
代码语言:javascript复制
mysql> show databases;
 -------------------- 
| Database           |
 -------------------- 
| information_schema |
| hive               |
| mysql              |
| performance_schema |
| spark              |
| sys                |
 -------------------- 
6 rows in set (0.00 sec)

mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
 --------------------------- 
| Tables_in_hive            |
 --------------------------- 
| AUX_TABLE                 |
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| COMPACTION_QUEUE          |
| COMPLETED_COMPACTIONS     |
| COMPLETED_TXN_COMPONENTS  |
| DATABASE_PARAMS           |
| DBS                       |
| DB_PRIVS                  |
| DELEGATION_TOKENS         |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| HIVE_LOCKS                |
| IDXS                      |
| INDEX_PARAMS              |
| KEY_CONSTRAINTS           |
| MASTER_KEYS               |
| NEXT_COMPACTION_QUEUE_ID  |
| NEXT_LOCK_ID              |
| NEXT_TXN_ID               |
| NOTIFICATION_LOG          |
| NOTIFICATION_SEQUENCE     |
| NUCLEUS_TABLES            |
| PARTITIONS                |
| PARTITION_EVENTS          |
| PARTITION_KEYS            |
| PARTITION_KEY_VALS        |
| PARTITION_PARAMS          |
| PART_COL_PRIVS            |
| PART_COL_STATS            |
| PART_PRIVS                |
| ROLES                     |
| ROLE_MAP                  |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_COL_VALUE_LOC_MAP  |
| SKEWED_STRING_LIST        |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| TBL_COL_PRIVS             |
| TBL_PRIVS                 |
| TXNS                      |
| TXN_COMPONENTS            |
| TYPES                     |
| TYPE_FIELDS               |
| VERSION                   |
| WRITE_SET                 |
 --------------------------- 
57 rows in set (0.00 sec)
  • 非交互式,加载查询脚本
代码语言:javascript复制
hadoop@dblab-VirtualBox:~/workspace$ vim script.q
hadoop@dblab-VirtualBox:~/workspace$ cat script.q 
select * from student;

hadoop@dblab-VirtualBox:~/workspace$ hive -f script.q 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Wed Mar 31 23:36:09 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45 , 5.6.26  and 5.7.6  requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:09 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45 , 5.6.26  and 5.7.6  requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:09 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45 , 5.6.26  and 5.7.6  requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:09 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45 , 5.6.26  and 5.7.6  requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:10 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45 , 5.6.26  and 5.7.6  requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:10 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45 , 5.6.26  and 5.7.6  requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:10 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45 , 5.6.26  and 5.7.6  requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:10 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45 , 5.6.26  and 5.7.6  requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
OK
1	michael	male	18	bigdata
2	ming	male	19	AI
3	lili	female	18	math
4	huahua	female	20	AI
Time taken: 1.376 seconds, Fetched: 4 row(s)
  • -e 选项内嵌语句 hadoop@dblab-VirtualBox:~/workspace$ hive -e 'select * from student'

0 人点赞