Hive学习笔记-202104

Hive学习笔记

1、Hive数据类型

基本数据类型

代码语言：javascript复制

tinyInt
smallInt
Int
BigInt
Boolean
float
double
string
timestamp
binary --字节数组

集合类型

代码语言：javascript复制

STRUCT 
和 c 语言中的 struct 类似，都可以通过“点”符号访问元素内容。例如，如果某个列的数据类型是 STRUCT{first STRING,
last STRING},那么第 1 个元素可以通过字段.first 来引用。

MAP
MAP 是一组键-值对元组集合，使用数组表示法可以访问数据。例如，如果某个列的数据类型是 MAP，其中键->值对是’first’->’John’和’last’->’Doe’，那么可以通过
字段名[‘last’]获取最后一个元素

ARRAY
数组是一组具有相同类型和名称的变量的集合。这些变量称为数组的元素，每个数组元素都有一个编号，编号从零开始。例如，数组值为[‘John’, ‘Doe’]，那么第 2 个元素可以通过数组名[1]进行引用。

2、建表语句示例

数据示例

代码语言：javascript复制

{
 "name": "songsong",
 "friends": ["bingbing" , "lili"] , //列表 Array, 
 "children": { //键值 Map,
 "xiao song": 18 ,
 "xiaoxiao song": 19
 }
 "address": { //结构 Struct,
 "street": "hui long guan" ,
 "city": "beijing" 
 } }

本地测试文件test.txt

代码语言：javascript复制

songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long 
guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao 
yang_beijing

建表语句

代码语言：javascript复制

create table test(
name string,
friends array<string>,
children map<string, int>,
address struct<street:string, city:string>
)
row format delimited 
fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by 'n';
location '/user/hive/warehouse/db/table';

## 解释：
row format delimited  分隔符设置开始语句

fields terminated by：设置字段与字段之间的分隔符

collection items terminated by：设置一个复杂类型（array,struct)字段的各个item之间的分隔符
map keys terminated by：设置一个复杂类型(Map)字段的key value之间的分隔符

lines terminated by：设置行与行之间的分隔符

3、删除数据库

代码语言：javascript复制

drop database if exists  db_hive2; 
如果库为空可以直接删除
如果不是空库，可以采用 cascade 命令，强制删除
drop database db_hive cascade;

4、分桶抽样查询

代码语言：javascript复制

select * from stu_buck tablesample(bucket 1 out of 4 on id);
## 注：tablesample 是抽样语句，语法：TABLESAMPLE(BUCKET x OUT OF y) 。 y 必须是 table 总 bucket 数的倍数或者因子。hive 根据 y 的大小，决定抽样的比例。例如，table 总共分了 4 份，当 y=2 时，抽取(4/2=)2 个 bucket 的数据，当 y=8 时，抽取(4/8=)1/2个 bucket 的数据。
## x 表示从哪个 bucket 开始抽取，如果需要取多个分区，以后的分区号为当前分区号加上y。例如，table 总 bucket 数为 4，tablesample(bucket 1 out of 2)，表示总共抽取（4/2=）2 个bucket 的数据，抽取第 1(x)个和第 3(x y)个 bucket 的数据。
## 注意：x 的值必须小于等于 y 的值，否则

hive

0 人点赞