经过前面那么久的折腾,我们终于可以切入主题了,接下来我们用数仓分层的理论,在Hive中建立数据仓库。
ODS层
启动Hive客户端,创建gmall数据库
代码语言:javascript复制[root@cdh2 ~]# hive
0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macr> create database gmall;
0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macr> use gmall;
创建启动日志表
ODS层创建启动日志表分析
代码语言:javascript复制drop table if exists ods_start_log;
CREATE EXTERNAL TABLE ods_start_log (`line` string)
PARTITIONED BY (`dt` string)
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
注意的是,我们要配置hive的LZO压缩INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
ODS层加载数据脚本
代码语言:javascript复制0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macr> load data inpath '/origin_data/gmall/log/topic_start/2020-11-24' into table gmall.ods_start_log partition(dt='2020-11-24');
INFO : Compiling command(queryId=hive_20201124101935_584d70b8-56bb-4102-aa82-f14e34180f6a): load data inpath '/origin_data/gmall/log/topic_start/2020-11-24' into table gmall.ods_start_log partition(dt='2020-11-24')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20201124101935_584d70b8-56bb-4102-aa82-f14e34180f6a); Time taken: 0.171 seconds
INFO : Executing command(queryId=hive_20201124101935_584d70b8-56bb-4102-aa82-f14e34180f6a): load data inpath '/origin_data/gmall/log/topic_start/2020-11-24' into table gmall.ods_start_log partition(dt='2020-11-24')
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table gmall.ods_start_log partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/log/topic_start/2020-11-24
INFO : Starting task [Stage-1:STATS] in serial mode
INFO : Completed executing command(queryId=hive_20201124101935_584d70b8-56bb-4102-aa82-f14e34180f6a); Time taken: 1.229 seconds
INFO : OK
No rows affected (1.443 seconds)
通过Hue查找数据,我们可以看到ods_start_log表已经有数据了
DWD层
创建启动表
代码语言:javascript复制0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macr> drop table if exists dwd_start_log;
0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macr> CREATE EXTERNAL TABLE dwd_start_log(
. . . . . . . . . . . . . . . . . . . . . . .> `mid_id` string,
. . . . . . . . . . . . . . . . . . . . . . .> `user_id` string,
. . . . . . . . . . . . . . . . . . . . . . .> `version_code` string,
. . . . . . . . . . . . . . . . . . . . . . .> `version_name` string,
. . . . . . . . . . . . . . . . . . . . . . .> `lang` string,
. . . . . . . . . . . . . . . . . . . . . . .> `source` string,
. . . . . . . . . . . . . . . . . . . . . . .> `os` string,
. . . . . . . . . . . . . . . . . . . . . . .> `area` string,
. . . . . . . . . . . . . . . . . . . . . . .> `model` string,
. . . . . . . . . . . . . . . . . . . . . . .> `brand` string,
. . . . . . . . . . . . . . . . . . . . . . .> `sdk_version` string,
. . . . . . . . . . . . . . . . . . . . . . .> `gmail` string,
. . . . . . . . . . . . . . . . . . . . . . .> `height_width` string,
. . . . . . . . . . . . . . . . . . . . . . .> `app_time` string,
. . . . . . . . . . . . . . . . . . . . . . .> `network` string,
. . . . . . . . . . . . . . . . . . . . . . .> `lng` string,
. . . . . . . . . . . . . . . . . . . . . . .> `lat` string,
. . . . . . . . . . . . . . . . . . . . . . .> `entry` string,
. . . . . . . . . . . . . . . . . . . . . . .> `open_ad_type` string,
. . . . . . . . . . . . . . . . . . . . . . .> `action` string,
. . . . . . . . . . . . . . . . . . . . . . .> `loading_time` string,
. . . . . . . . . . . . . . . . . . . . . . .> `detail` string,
. . . . . . . . . . . . . . . . . . . . . . .> `extend1` string
. . . . . . . . . . . . . . . . . . . . . . .> )
. . . . . . . . . . . . . . . . . . . . . . .> PARTITIONED BY (dt string);
DWD层启动表加载数据脚本
新建脚本dwd_start_log.sh
代码语言:javascript复制#!/bin/bash
# 定义变量方便修改
APP=gmall
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" %F`
fi
sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table "$APP".dwd_start_log
PARTITION (dt='$do_date')
select
get_json_object(line,'$.mid') mid_id,
get_json_object(line,'$.uid') user_id,
get_json_object(line,'$.vc') version_code,
get_json_object(line,'$.vn') version_name,
get_json_object(line,'$.l') lang,
get_json_object(line,'$.sr') source,
get_json_object(line,'$.os') os,
get_json_object(line,'$.ar') area,
get_json_object(line,'$.md') model,
get_json_object(line,'$.ba') brand,
get_json_object(line,'$.sv') sdk_version,
get_json_object(line,'$.g') gmail,
get_json_object(line,'$.hw') height_width,
get_json_object(line,'$.t') app_time,
get_json_object(line,'$.nw') network,
get_json_object(line,'$.ln') lng,
get_json_object(line,'$.la') lat,
get_json_object(line,'$.entry') entry,
get_json_object(line,'$.open_ad_type') open_ad_type,
get_json_object(line,'$.action') action,
get_json_object(line,'$.loading_time') loading_time,
get_json_object(line,'$.detail') detail,
get_json_object(line,'$.extend1') extend1
from "$APP".ods_start_log
where dt='$do_date';
"
hive -e "$sql"
增加脚本执行权限
代码语言:javascript复制chmod 777 dwd_start_log.sh
执行脚本
代码语言:javascript复制dwd_start_log.sh 2020-11-24
代码语言:javascript复制[root@cdh2]# dwd_start_log.sh 2020-11-24
INFO : Compiling command(queryId=hive_20201124203254_3e88639f-e439-40d3-bd7b-bbbb2159fb3b): insert overwrite table gmall.dwd_start_log
PARTITION (dt='2020-11-24')
select
get_json_object(line,'$.mid') mid_id,
get_json_object(line,'$.uid') user_id,
get_json_object(line,'$.vc') version_code,
get_json_object(line,'$.vn') version_name,
get_json_object(line,'$.l') lang,
get_json_object(line,'$.sr') source,
get_json_object(line,'$.os') os,
get_json_object(line,'$.ar') area,
get_json_object(line,'$.md') model,
get_json_object(line,'$.ba') brand,
get_json_object(line,'$.sv') sdk_version,
get_json_object(line,'$.g') gmail,
get_json_object(line,'$.hw') height_width,
get_json_object(line,'$.t') app_time,
get_json_object(line,'$.nw') network,
get_json_object(line,'$.ln') lng,
get_json_object(line,'$.la') lat,
get_json_object(line,'$.entry') entry,
get_json_object(line,'$.open_ad_type') open_ad_type,
get_json_object(line,'$.action') action,
get_json_object(line,'$.loading_time') loading_time,
get_json_object(line,'$.detail') detail,
get_json_object(line,'$.extend1') extend1
from gmall.ods_start_log
where dt='2020-11-24'
INFO : No Stats for gmall@ods_start_log, Columns: line
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:mid_id, type:string, comment:null), FieldSchema(name:user_id, type:string, comment:null), FieldSchema(name:version_code, type:string, comment:null), FieldSchema(name:version_name, type:string, comment:null), FieldSchema(name:lang, type:string, comment:null), FieldSchema(name:source, type:string, comment:null), FieldSchema(name:os, type:string, comment:null), FieldSchema(name:area, type:string, comment:null), FieldSchema(name:model, type:string, comment:null), FieldSchema(name:brand, type:string, comment:null), FieldSchema(name:sdk_version, type:string, comment:null), FieldSchema(name:gmail, type:string, comment:null), FieldSchema(name:height_width, type:string, comment:null), FieldSchema(name:app_time, type:string, comment:null), FieldSchema(name:network, type:string, comment:null), FieldSchema(name:lng, type:string, comment:null), FieldSchema(name:lat, type:string, comment:null), FieldSchema(name:entry, type:string, comment:null), FieldSchema(name:open_ad_type, type:string, comment:null), FieldSchema(name:action, type:string, comment:null), FieldSchema(name:loading_time, type:string, comment:null), FieldSchema(name:detail, type:string, comment:null), FieldSchema(name:extend1, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20201124203254_3e88639f-e439-40d3-bd7b-bbbb2159fb3b); Time taken: 1.615 seconds
INFO : Executing command(queryId=hive_20201124203254_3e88639f-e439-40d3-bd7b-bbbb2159fb3b): insert overwrite table gmall.dwd_start_log
PARTITION (dt='2020-11-24')
select
get_json_object(line,'$.mid') mid_id,
get_json_object(line,'$.uid') user_id,
get_json_object(line,'$.vc') version_code,
get_json_object(line,'$.vn') version_name,
get_json_object(line,'$.l') lang,
get_json_object(line,'$.sr') source,
get_json_object(line,'$.os') os,
get_json_object(line,'$.ar') area,
get_json_object(line,'$.md') model,
get_json_object(line,'$.ba') brand,
get_json_object(line,'$.sv') sdk_version,
get_json_object(line,'$.g') gmail,
get_json_object(line,'$.hw') height_width,
get_json_object(line,'$.t') app_time,
get_json_object(line,'$.nw') network,
get_json_object(line,'$.ln') lng,
get_json_object(line,'$.la') lat,
get_json_object(line,'$.entry') entry,
get_json_object(line,'$.open_ad_type') open_ad_type,
get_json_object(line,'$.action') action,
get_json_object(line,'$.loading_time') loading_time,
get_json_object(line,'$.detail') detail,
get_json_object(line,'$.extend1') extend1
from gmall.ods_start_log
where dt='2020-11-24'
INFO : Query ID = hive_20201124203254_3e88639f-e439-40d3-bd7b-bbbb2159fb3b
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20201124203254_3e88639f-e439-40d3-bd7b-bbbb2159fb3b
INFO : Session is already open
INFO : Dag name: insert overwrite table gma...dt='2020-11-24' (Stage-1)
INFO : Tez session was closed. Reopening...
INFO : Session re-established.
INFO : Session re-established.
INFO : Status: Running (Executing on YARN cluster with App id application_1606125802436_0004)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 2 2 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 10.58 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 9.26 seconds
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 1.61s
INFO : Prepare Plan 0.16s
INFO : Get Query Coordinator (AM) 0.03s
INFO : Submit Plan 4.28s
INFO : Start DAG 1.36s
INFO : Run DAG 9.26s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 5540.00 11,310 128 502 1
INFO : Reducer 2 177.00 1,490 12 1 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 3
INFO : TOTAL_LAUNCHED_TASKS: 3
INFO : DATA_LOCAL_TASKS: 1
INFO : AM_CPU_MILLISECONDS: 3270
INFO : AM_GC_TIME_MILLIS: 19
INFO : File System Counters:
INFO : FILE_BYTES_READ: 3722
INFO : FILE_BYTES_WRITTEN: 3610
INFO : HDFS_BYTES_READ: 48652
INFO : HDFS_BYTES_WRITTEN: 73761
INFO : HDFS_READ_OPS: 8
INFO : HDFS_WRITE_OPS: 5
INFO : HDFS_OP_CREATE: 3
INFO : HDFS_OP_GET_FILE_STATUS: 7
INFO : HDFS_OP_OPEN: 1
INFO : HDFS_OP_RENAME: 2
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : REDUCE_INPUT_GROUPS: 1
INFO : REDUCE_INPUT_RECORDS: 1
INFO : COMBINE_INPUT_RECORDS: 0
INFO : SPILLED_RECORDS: 2
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : NUM_SKIPPED_INPUTS: 1
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : MERGED_MAP_OUTPUTS: 1
INFO : GC_TIME_MILLIS: 140
INFO : TASK_DURATION_MILLIS: 5902
INFO : CPU_MILLISECONDS: 12800
INFO : PHYSICAL_MEMORY_BYTES: 1645215744
INFO : VIRTUAL_MEMORY_BYTES: 16442019840
INFO : COMMITTED_HEAP_BYTES: 1645215744
INFO : INPUT_RECORDS_PROCESSED: 502
INFO : INPUT_SPLIT_LENGTH_BYTES: 48652
INFO : OUTPUT_RECORDS: 1
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 9238
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 9248
INFO : OUTPUT_BYTES_PHYSICAL: 3554
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILLS_BYTES_READ: 3554
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_CHUNK_COUNT: 1
INFO : SHUFFLE_BYTES: 3554
INFO : SHUFFLE_BYTES_DECOMPRESSED: 9248
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 3554
INFO : NUM_MEM_TO_DISK_MERGES: 0
INFO : NUM_DISK_TO_DISK_MERGES: 0
INFO : SHUFFLE_PHASE_TIME: 72
INFO : MERGE_PHASE_TIME: 89
INFO : FIRST_EVENT_RECEIVED: 49
INFO : LAST_EVENT_RECEIVED: 49
INFO : HIVE:
INFO : CREATED_FILES: 2
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 502
INFO : RECORDS_OUT_0: 1
INFO : RECORDS_OUT_1_gmall.dwd_start_log: 502
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 1
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO : RECORDS_OUT_OPERATOR_FS_10: 1
INFO : RECORDS_OUT_OPERATOR_FS_3: 502
INFO : RECORDS_OUT_OPERATOR_GBY_6: 1
INFO : RECORDS_OUT_OPERATOR_GBY_8: 1
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_7: 1
INFO : RECORDS_OUT_OPERATOR_SEL_2: 502
INFO : RECORDS_OUT_OPERATOR_SEL_5: 502
INFO : RECORDS_OUT_OPERATOR_SEL_9: 1
INFO : RECORDS_OUT_OPERATOR_TS_0: 502
INFO : TOTAL_TABLE_ROWS_WRITTEN: 502
INFO : Shuffle Errors:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : Shuffle Errors_Reducer_2_INPUT_Map_1:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : TaskCounter_Map_1_INPUT_ods_start_log:
INFO : INPUT_RECORDS_PROCESSED: 502
INFO : INPUT_SPLIT_LENGTH_BYTES: 48652
INFO : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : OUTPUT_BYTES: 9238
INFO : OUTPUT_BYTES_PHYSICAL: 3554
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 9248
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 1
INFO : SHUFFLE_CHUNK_COUNT: 1
INFO : SPILLED_RECORDS: 1
INFO : TaskCounter_Reducer_2_INPUT_Map_1:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 3554
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : COMBINE_INPUT_RECORDS: 0
INFO : FIRST_EVENT_RECEIVED: 49
INFO : LAST_EVENT_RECEIVED: 49
INFO : MERGED_MAP_OUTPUTS: 1
INFO : MERGE_PHASE_TIME: 89
INFO : NUM_DISK_TO_DISK_MERGES: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_MEM_TO_DISK_MERGES: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : NUM_SKIPPED_INPUTS: 1
INFO : REDUCE_INPUT_GROUPS: 1
INFO : REDUCE_INPUT_RECORDS: 1
INFO : SHUFFLE_BYTES: 3554
INFO : SHUFFLE_BYTES_DECOMPRESSED: 9248
INFO : SHUFFLE_BYTES_DISK_DIRECT: 3554
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_PHASE_TIME: 72
INFO : SPILLED_RECORDS: 1
INFO : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
INFO : OUTPUT_RECORDS: 0
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 1
INFO : INPUT_DIRECTORIES_Map_1: 1
INFO : INPUT_FILES_Map_1: 1
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... container SUCCEEDED 2 2 0 0 0 0 640498459124089-1/-ext-10000
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 10.61 s
----------------------------------------------------------------------------------------------
502 rows affected (18.059 seconds)
Hue中查看dwd_start_log,可以看到已经生成数据了
DWS层(用户日活跃)
每日活跃设备分析
目标:统计当日、当周、当月活动的每个设备明细
新建dws_uv_detail_day表
代码语言:javascript复制drop table if exists dws_uv_detail_day;
create external table dws_uv_detail_day
(
`mid_id` string COMMENT '设备唯一标识',
`user_id` string COMMENT '用户标识',
`version_code` string COMMENT '程序版本号',
`version_name` string COMMENT '程序版本名',
`lang` string COMMENT '系统语言',
`source` string COMMENT '渠道号',
`os` string COMMENT '安卓系统版本',
`area` string COMMENT '区域',
`model` string COMMENT '手机型号',
`brand` string COMMENT '手机品牌',
`sdk_version` string COMMENT 'sdkVersion',
`gmail` string COMMENT 'gmail',
`height_width` string COMMENT '屏幕宽高',
`app_time` string COMMENT '客户端日志产生时的时间',
`network` string COMMENT '网络模式',
`lng` string COMMENT '经度',
`lat` string COMMENT '纬度'
)
partitioned by(dt string)
stored as parquet
新建脚本dws_log.sh
代码语言:javascript复制[root@cdh2 25k]# cat dws_log.sh
#!/bin/bash
# 定义变量方便修改
APP=gmall
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" %F`
fi
sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table "$APP".dws_uv_detail_day partition(dt='$do_date')
select
mid_id,
concat_ws('|', collect_set(user_id)) user_id,
concat_ws('|', collect_set(version_code)) version_code,
concat_ws('|', collect_set(version_name)) version_name,
concat_ws('|', collect_set(lang)) lang,
concat_ws('|', collect_set(source)) source,
concat_ws('|', collect_set(os)) os,
concat_ws('|', collect_set(area)) area,
concat_ws('|', collect_set(model)) model,
concat_ws('|', collect_set(brand)) brand,
concat_ws('|', collect_set(sdk_version)) sdk_version,
concat_ws('|', collect_set(gmail)) gmail,
concat_ws('|', collect_set(height_width)) height_width,
concat_ws('|', collect_set(app_time)) app_time,
concat_ws('|', collect_set(network)) network,
concat_ws('|', collect_set(lng)) lng,
concat_ws('|', collect_set(lat)) lat
from "$APP".dwd_start_log
where dt='$do_date'
group by mid_id;
"
hive -e "$sql"
修改执行权限
代码语言:javascript复制chmod 777 dws_log.sh
执行脚本
代码语言:javascript复制[root@cdh2 25k]# dws_log.sh 2020-11-24
20/11/24 21:43:12 [main]: INFO jdbc.HiveConnection: Connected to cdh2.macro.com:10000
Connected to: Apache Hive (version 3.1.3000.7.1.3.0-100)
Driver: Hive JDBC (version 3.1.3000.7.1.3.0-100)
Transaction isolation: TRANSACTION_REPEATABLE_READ
No rows affected (0.068 seconds)
INFO : Compiling command(queryId=hive_20201124214313_9cbb09a0-699d-4747-a5e9-a126602a5435): insert overwrite table gmall.dws_uv_detail_day partition(dt='2020-11-24')
select
mid_id,
concat_ws('|', collect_set(user_id)) user_id,
concat_ws('|', collect_set(version_code)) version_code,
concat_ws('|', collect_set(version_name)) version_name,
concat_ws('|', collect_set(lang)) lang,
concat_ws('|', collect_set(source)) source,
concat_ws('|', collect_set(os)) os,
concat_ws('|', collect_set(area)) area,
concat_ws('|', collect_set(model)) model,
concat_ws('|', collect_set(brand)) brand,
concat_ws('|', collect_set(sdk_version)) sdk_version,
concat_ws('|', collect_set(gmail)) gmail,
concat_ws('|', collect_set(height_width)) height_width,
concat_ws('|', collect_set(app_time)) app_time,
concat_ws('|', collect_set(network)) network,
concat_ws('|', collect_set(lng)) lng,
concat_ws('|', collect_set(lat)) lat
from gmall.dwd_start_log
where dt='2020-11-24'
group by mid_id
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:mid_id, type:string, comment:null), FieldSchema(name:user_id, type:string, comment:null), FieldSchema(name:version_code, type:string, comment:null), FieldSchema(name:version_name, type:string, comment:null), FieldSchema(name:lang, type:string, comment:null), FieldSchema(name:source, type:string, comment:null), FieldSchema(name:os, type:string, comment:null), FieldSchema(name:area, type:string, comment:null), FieldSchema(name:model, type:string, comment:null), FieldSchema(name:brand, type:string, comment:null), FieldSchema(name:sdk_version, type:string, comment:null), FieldSchema(name:gmail, type:string, comment:null), FieldSchema(name:height_width, type:string, comment:null), FieldSchema(name:app_time, type:string, comment:null), FieldSchema(name:network, type:string, comment:null), FieldSchema(name:lng, type:string, comment:null), FieldSchema(name:lat, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20201124214313_9cbb09a0-699d-4747-a5e9-a126602a5435); Time taken: 0.669 seconds
INFO : Executing command(queryId=hive_20201124214313_9cbb09a0-699d-4747-a5e9-a126602a5435): insert overwrite table gmall.dws_uv_detail_day partition(dt='2020-11-24')
select
mid_id,
concat_ws('|', collect_set(user_id)) user_id,
concat_ws('|', collect_set(version_code)) version_code,
concat_ws('|', collect_set(version_name)) version_name,
concat_ws('|', collect_set(lang)) lang,
concat_ws('|', collect_set(source)) source,
concat_ws('|', collect_set(os)) os,
concat_ws('|', collect_set(area)) area,
concat_ws('|', collect_set(model)) model,
concat_ws('|', collect_set(brand)) brand,
concat_ws('|', collect_set(sdk_version)) sdk_version,
concat_ws('|', collect_set(gmail)) gmail,
concat_ws('|', collect_set(height_width)) height_width,
concat_ws('|', collect_set(app_time)) app_time,
concat_ws('|', collect_set(network)) network,
concat_ws('|', collect_set(lng)) lng,
concat_ws('|', collect_set(lat)) lat
from gmall.dwd_start_log
where dt='2020-11-24'
group by mid_id
INFO : Query ID = hive_20201124214313_9cbb09a0-699d-4747-a5e9-a126602a5435
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20201124214313_9cbb09a0-699d-4747-a5e9-a126602a5435
INFO : Session is already open
INFO : Dag name: insert overwrite table gmall.dws_uv...mid_id (Stage-1)
INFO : Tez session was closed. Reopening...
INFO : Session re-established.
INFO : Session re-established.
INFO : Status: Running (Executing on YARN cluster with App id application_1606125802436_0005)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 2 2 0 0 0 0
Reducer 3 ...... container SUCCEEDED 2 2 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 12.60 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 12.05 seconds
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 0.67s
INFO : Prepare Plan 0.12s
INFO : Get Query Coordinator (AM) 0.01s
INFO : Submit Plan 4.36s
INFO : Start DAG 1.28s
INFO : Run DAG 12.05s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 3531.00 7,710 86 502 502
INFO : Reducer 2 6983.00 13,430 128 502 2
INFO : Reducer 3 4951.00 1,270 0 2 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 5
INFO : TOTAL_LAUNCHED_TASKS: 5
INFO : DATA_LOCAL_TASKS: 1
INFO : AM_CPU_MILLISECONDS: 3690
INFO : AM_GC_TIME_MILLIS: 42
INFO : File System Counters:
INFO : FILE_BYTES_READ: 51126
INFO : FILE_BYTES_WRITTEN: 69324
INFO : HDFS_BYTES_READ: 61338
INFO : HDFS_BYTES_WRITTEN: 60185
INFO : HDFS_READ_OPS: 12
INFO : HDFS_WRITE_OPS: 8
INFO : HDFS_OP_CREATE: 5
INFO : HDFS_OP_GET_FILE_STATUS: 11
INFO : HDFS_OP_OPEN: 1
INFO : HDFS_OP_RENAME: 3
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : REDUCE_INPUT_GROUPS: 503
INFO : REDUCE_INPUT_RECORDS: 504
INFO : COMBINE_INPUT_RECORDS: 0
INFO : SPILLED_RECORDS: 1008
INFO : NUM_SHUFFLED_INPUTS: 4
INFO : NUM_SKIPPED_INPUTS: 2
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : MERGED_MAP_OUTPUTS: 4
INFO : GC_TIME_MILLIS: 214
INFO : TASK_DURATION_MILLIS: 15451
INFO : CPU_MILLISECONDS: 22410
INFO : PHYSICAL_MEMORY_BYTES: 2729443328
INFO : VIRTUAL_MEMORY_BYTES: 27425902592
INFO : COMMITTED_HEAP_BYTES: 2729443328
INFO : INPUT_RECORDS_PROCESSED: 502
INFO : INPUT_SPLIT_LENGTH_BYTES: 61338
INFO : OUTPUT_RECORDS: 504
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 120214
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 121752
INFO : OUTPUT_BYTES_PHYSICAL: 46586
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 22570
INFO : ADDITIONAL_SPILLS_BYTES_READ: 46586
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_CHUNK_COUNT: 3
INFO : SHUFFLE_BYTES: 46586
INFO : SHUFFLE_BYTES_DECOMPRESSED: 121752
INFO : SHUFFLE_BYTES_TO_MEM: 22570
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 24016
INFO : NUM_MEM_TO_DISK_MERGES: 0
INFO : NUM_DISK_TO_DISK_MERGES: 0
INFO : SHUFFLE_PHASE_TIME: 5484
INFO : MERGE_PHASE_TIME: 5613
INFO : FIRST_EVENT_RECEIVED: 127
INFO : LAST_EVENT_RECEIVED: 5243
INFO : HIVE:
INFO : CREATED_FILES: 3
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 502
INFO : RECORDS_OUT_0: 1
INFO : RECORDS_OUT_1_gmall.dws_uv_detail_day: 502
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 502
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 2
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_3: 0
INFO : RECORDS_OUT_OPERATOR_FS_14: 1
INFO : RECORDS_OUT_OPERATOR_FS_7: 502
INFO : RECORDS_OUT_OPERATOR_GBY_10: 2
INFO : RECORDS_OUT_OPERATOR_GBY_12: 1
INFO : RECORDS_OUT_OPERATOR_GBY_3: 502
INFO : RECORDS_OUT_OPERATOR_GBY_5: 502
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_11: 2
INFO : RECORDS_OUT_OPERATOR_RS_4: 502
INFO : RECORDS_OUT_OPERATOR_SEL_13: 1
INFO : RECORDS_OUT_OPERATOR_SEL_2: 502
INFO : RECORDS_OUT_OPERATOR_SEL_6: 502
INFO : RECORDS_OUT_OPERATOR_SEL_9: 502
INFO : RECORDS_OUT_OPERATOR_TS_0: 502
INFO : TOTAL_TABLE_ROWS_WRITTEN: 502
INFO : Shuffle Errors:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : Shuffle Errors_Reducer_2_INPUT_Map_1:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : Shuffle Errors_Reducer_3_INPUT_Reducer_2:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : TaskCounter_Map_1_INPUT_dwd_start_log:
INFO : INPUT_RECORDS_PROCESSED: 502
INFO : INPUT_SPLIT_LENGTH_BYTES: 61338
INFO : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : OUTPUT_BYTES: 105540
INFO : OUTPUT_BYTES_PHYSICAL: 40988
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 107058
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 502
INFO : SHUFFLE_CHUNK_COUNT: 1
INFO : SPILLED_RECORDS: 502
INFO : TaskCounter_Reducer_2_INPUT_Map_1:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 40988
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 19779
INFO : COMBINE_INPUT_RECORDS: 0
INFO : FIRST_EVENT_RECEIVED: 102
INFO : LAST_EVENT_RECEIVED: 102
INFO : MERGED_MAP_OUTPUTS: 2
INFO : MERGE_PHASE_TIME: 393
INFO : NUM_DISK_TO_DISK_MERGES: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_MEM_TO_DISK_MERGES: 0
INFO : NUM_SHUFFLED_INPUTS: 2
INFO : NUM_SKIPPED_INPUTS: 0
INFO : REDUCE_INPUT_GROUPS: 502
INFO : REDUCE_INPUT_RECORDS: 502
INFO : SHUFFLE_BYTES: 40988
INFO : SHUFFLE_BYTES_DECOMPRESSED: 107058
INFO : SHUFFLE_BYTES_DISK_DIRECT: 21209
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 19779
INFO : SHUFFLE_PHASE_TIME: 272
INFO : SPILLED_RECORDS: 502
INFO : TaskCounter_Reducer_2_OUTPUT_Reducer_3:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : OUTPUT_BYTES: 14674
INFO : OUTPUT_BYTES_PHYSICAL: 5598
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 14694
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 2
INFO : SHUFFLE_CHUNK_COUNT: 2
INFO : SPILLED_RECORDS: 2
INFO : TaskCounter_Reducer_3_INPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 5598
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 2791
INFO : COMBINE_INPUT_RECORDS: 0
INFO : FIRST_EVENT_RECEIVED: 25
INFO : LAST_EVENT_RECEIVED: 5141
INFO : MERGED_MAP_OUTPUTS: 2
INFO : MERGE_PHASE_TIME: 5220
INFO : NUM_DISK_TO_DISK_MERGES: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_MEM_TO_DISK_MERGES: 0
INFO : NUM_SHUFFLED_INPUTS: 2
INFO : NUM_SKIPPED_INPUTS: 2
INFO : REDUCE_INPUT_GROUPS: 1
INFO : REDUCE_INPUT_RECORDS: 2
INFO : SHUFFLE_BYTES: 5598
INFO : SHUFFLE_BYTES_DECOMPRESSED: 14694
INFO : SHUFFLE_BYTES_DISK_DIRECT: 2807
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 2791
INFO : SHUFFLE_PHASE_TIME: 5212
INFO : SPILLED_RECORDS: 2
INFO : TaskCounter_Reducer_3_OUTPUT_out_Reducer_3:
INFO : OUTPUT_RECORDS: 0
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 1
INFO : INPUT_DIRECTORIES_Map_1: 1
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 2 2 0 0 0 0 h1.macro.com:8020/warehouse/tablespace/Reducer 3 ...... container SUCCEEDED 2 2 0 0 0 0 211_5016652950517580007-8/-ext-10000
----------------------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 12.62 s
----------------------------------------------------------------------------------------------
502 rows affected (19.065 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2
查看Hue页面,dws_uv_detail_day表已经生成数据了。
ADS层(需求:用户日活跃)
目标:当日活跃设备数
活跃设备数
建表
代码语言:javascript复制drop table if exists ads_uv_count;
create external table ads_uv_count(
`dt` string COMMENT '统计日期',
`day_count` bigint COMMENT '当日用户数量'
) COMMENT '活跃设备数'
row format delimited fields terminated by 't';
ADS层加载数据脚本
代码语言:javascript复制#!/bin/bash
# 定义变量方便修改
APP=gmall
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" %F`
fi
sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table "$APP".ads_uv_count
select
'$do_date' dt,
daycount.ct
from
(
select
'$do_date' dt,
count(*) ct
from "$APP".dws_uv_detail_day
where dt='$do_date'
)daycount;
"
hive -e "$sql"
增加脚本执行权限
代码语言:javascript复制chmod 777 ads_uv_log.sh
执行脚本
代码语言:javascript复制[root@cdh2 25k]# ads_uv_log.sh 2020-11-24
No rows affected (0.07 seconds)
INFO : Compiling command(queryId=hive_20201125002417_71fcb5d2-816d-4fce-9674-631a8b04cf3b): insert into table gmall.ads_uv_count
select
'2020-11-24' dt,
daycount.ct
from
(
select
'2020-11-24' dt,
count(*) ct
from gmall.dws_uv_detail_day
where dt='2020-11-24'
)daycount
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:dt, type:string, comment:null), FieldSchema(name:daycount.ct, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20201125002417_71fcb5d2-816d-4fce-9674-631a8b04cf3b); Time taken: 0.178 seconds
INFO : Executing command(queryId=hive_20201125002417_71fcb5d2-816d-4fce-9674-631a8b04cf3b): insert into table gmall.ads_uv_count
select
'2020-11-24' dt,
daycount.ct
from
(
select
'2020-11-24' dt,
count(*) ct
from gmall.dws_uv_detail_day
where dt='2020-11-24'
)daycount
INFO : Query ID = hive_20201125002417_71fcb5d2-816d-4fce-9674-631a8b04cf3b
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20201125002417_71fcb5d2-816d-4fce-9674-631a8b04cf3b
INFO : Session is already open
INFO : Dag name: insert into table gmall.ads_uv_...
)daycount (Stage-1)
INFO : Tez session was closed. Reopening...
INFO : Session re-established.
INFO : Session re-established.
INFO : Status: Running (Executing on YARN cluster with App id application_1606125802436_0007)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.82 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 5.32 seconds
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 0.18s
INFO : Prepare Plan 0.09s
INFO : Get Query Coordinator (AM) 0.01s
INFO : Submit Plan 4.27s
INFO : Start DAG 1.40s
INFO : Run DAG 5.32s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 3025.00 6,120 85 502 1
INFO : Reducer 2 261.00 1,050 8 1 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 2
INFO : TOTAL_LAUNCHED_TASKS: 2
INFO : RACK_LOCAL_TASKS: 1
INFO : AM_CPU_MILLISECONDS: 3350
INFO : AM_GC_TIME_MILLIS: 26
INFO : File System Counters:
INFO : FILE_BYTES_READ: 59
INFO : FILE_BYTES_WRITTEN: 59
INFO : HDFS_BYTES_READ: 52659
INFO : HDFS_BYTES_WRITTEN: 258
INFO : HDFS_READ_OPS: 19
INFO : HDFS_WRITE_OPS: 5
INFO : HDFS_OP_CREATE: 3
INFO : HDFS_OP_GET_FILE_STATUS: 13
INFO : HDFS_OP_OPEN: 6
INFO : HDFS_OP_RENAME: 2
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : SPILLED_RECORDS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : GC_TIME_MILLIS: 93
INFO : TASK_DURATION_MILLIS: 3255
INFO : CPU_MILLISECONDS: 7170
INFO : PHYSICAL_MEMORY_BYTES: 524288000
INFO : VIRTUAL_MEMORY_BYTES: 10938982400
INFO : COMMITTED_HEAP_BYTES: 524288000
INFO : INPUT_RECORDS_PROCESSED: 3
INFO : INPUT_SPLIT_LENGTH_BYTES: 49896
INFO : OUTPUT_RECORDS: 1
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 5
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 13
INFO : OUTPUT_BYTES_PHYSICAL: 51
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_BYTES: 27
INFO : SHUFFLE_BYTES_DECOMPRESSED: 13
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 27
INFO : SHUFFLE_PHASE_TIME: 50
INFO : FIRST_EVENT_RECEIVED: 33
INFO : LAST_EVENT_RECEIVED: 33
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : HIVE:
INFO : CREATED_FILES: 2
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 502
INFO : RECORDS_OUT_0: 1
INFO : RECORDS_OUT_1_gmall.ads_uv_count: 1
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 1
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO : RECORDS_OUT_OPERATOR_FS_14: 1
INFO : RECORDS_OUT_OPERATOR_FS_7: 1
INFO : RECORDS_OUT_OPERATOR_GBY_12: 1
INFO : RECORDS_OUT_OPERATOR_GBY_17: 1
INFO : RECORDS_OUT_OPERATOR_GBY_5: 1
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_18: 1
INFO : RECORDS_OUT_OPERATOR_SEL_13: 1
INFO : RECORDS_OUT_OPERATOR_SEL_16: 502
INFO : RECORDS_OUT_OPERATOR_SEL_6: 1
INFO : RECORDS_OUT_OPERATOR_SEL_9: 1
INFO : RECORDS_OUT_OPERATOR_TS_0: 502
INFO : TOTAL_TABLE_ROWS_WRITTEN: 1
INFO : TaskCounter_Map_1_INPUT_dws_uv_detail_day:
INFO : INPUT_RECORDS_PROCESSED: 2
INFO : INPUT_SPLIT_LENGTH_BYTES: 49896
INFO : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 5
INFO : OUTPUT_BYTES_PHYSICAL: 51
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 13
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 1
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Reducer_2_INPUT_Map_1:
INFO : FIRST_EVENT_RECEIVED: 33
INFO : INPUT_RECORDS_PROCESSED: 1
INFO : LAST_EVENT_RECEIVED: 33
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : SHUFFLE_BYTES: 27
INFO : SHUFFLE_BYTES_DECOMPRESSED: 13
INFO : SHUFFLE_BYTES_DISK_DIRECT: 27
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_PHASE_TIME: 50
INFO : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
INFO : OUTPUT_RECORDS: 0
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 1
INFO : INPUT_DIRECTORIES_Map_1: 1
INFO : INPUT_FILES_Map_1: 2
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.83 s
----------------------------------------------------------------------------------------------
1 row affected (11.783 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2
查看hue,可以看到ads_uv_count表的数据