业务数仓
91712 Map-Reduce Framework Map input records=125 Map output records=125 Input split bytes=85 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=71 CPU time spent (ms)=1700 Physical memory (bytes) snapshot=259682304 Virtual memory (bytes) snapshot=2850103296 Total committed heap usage (bytes)=235929600 Peak Map Physical memory (bytes)=259682304 Peak Map Virtual memory (bytes)=2850103296 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=2181 20/11/25 11:07:51 INFO mapreduce.ImportJobBase: Transferred 2.1299 KB in 29.0742 seconds (75.0149 bytes/sec) 20/11/25 11:07:51 INFO mapreduce.ImportJobBase: Retrieved 125 records. Warning: /opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 20/11/25 11:07:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.7.1.3.0-100 20/11/25 11:07:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 20/11/25 11:07:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 20/11/25 11:07:56 INFO tool.CodeGenTool: Beginning code generation 20/11/25 11:07:57 INFO manager.SqlManager: Executing SQL statement: select id, name, category2_id from base_category3 where 1=1 and (1 = 0) ; 20/11/25 11:07:57 INFO manager.SqlManager: Executing SQL statement: select id, name, category2_id from base_category3 where 1=1 and (1 = 0) ; 20/11/25 11:07:57 INFO manager.SqlManager: Executing SQL statement: select id, name, category2_id from base_category3 where 1=1 and (1 = 0) ; 20/11/25 11:07:57 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce warning: No SupportedSourceVersion annotation found on org.apache.hadoop.hdds.conf.ConfigFileGenerator, returning RELEASE_6. warning: Supported source version 'RELEASE_6' from annotation processor 'org.apache.hadoop.hdds.conf.ConfigFileGenerator' less than -source '1.8' 2 warnings 20/11/25 11:07:59 WARN orm.CompilationManager: Could not rename /tmp/sqoop-root/compile/7b64a4d6eff156e0c71805f454daf5af/QueryResult.java to /data0/25k/./QueryResult.java. Error: Destination '/data0/25k/./QueryResult.java' already exists 20/11/25 11:07:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/7b64a4d6eff156e0c71805f454daf5af/QueryResult.jar 20/11/25 11:08:00 INFO tool.ImportTool: Destination directory /origin_data/gmall/db/base_category3/2020-11-24 is not present, hence not deleting. 20/11/25 11:08:00 INFO mapreduce.ImportJobBase: Beginning query import. 20/11/25 11:08:00 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 20/11/25 11:08:00 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 20/11/25 11:08:01 INFO client.RMProxy: Connecting to ResourceManager at cdh2.macro.com/192.168.0.207:8032 20/11/25 11:08:02 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hive/.staging/job_1606125802436_0010 20/11/25 11:08:05 INFO db.DBInputFormat: Using read commited transaction isolation 20/11/25 11:08:05 INFO mapreduce.JobSubmitter: number of splits:1 20/11/25 11:08:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606125802436_0010 20/11/25 11:08:05 INFO mapreduce.JobSubmitter: Executing with tokens: [] 20/11/25 11:08:06 INFO conf.Configuration: resource-types.xml not found 20/11/25 11:08:06 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 20/11/25 11:08:06 INFO impl.YarnClientImpl: Submitted application application_1606125802436_0010 20/11/25 11:08:06 INFO mapreduce.Job: The url to track the job: http://cdh2.macro.com:8088/proxy/application_1606125802436_0010/ 20/11/25 11:08:06 INFO mapreduce.Job: Running job: job_1606125802436_0010 20/11/25 11:08:15 INFO mapreduce.Job: Job job_1606125802436_0010 running in uber mode : false 20/11/25 11:08:15 INFO mapreduce.Job: map 0% reduce 0% 20/11/25 11:08:25 INFO mapreduce.Job: map 100% reduce 0% 20/11/25 11:08:25 INFO mapreduce.Job: Job job_1606125802436_0010 completed successfully
ODS层
完全仿照业务数据库中的表字段,一模一样的创建ODS层对应表。
下面是建表语句:
代码语言:javascript复制drop table if exists ods_order_info;
create external table ods_order_info (
`id` string COMMENT '订单编号',
`total_amount` decimal(10,2) COMMENT '订单金额',
`order_status` string COMMENT '订单状态',
`user_id` string COMMENT '用户id' ,
`payment_way` string COMMENT '支付方式',
`out_trade_no` string COMMENT '支付流水号',
`create_time` string COMMENT '创建时间',
`operate_time` string COMMENT '操作时间'
) COMMENT '订单表'
PARTITIONED BY ( `dt` string)
row format delimited fields terminated by 't' ;
代码语言:javascript复制drop table if exists ods_order_detail;
create external table ods_order_detail(
`id` string COMMENT '订单编号',
`order_id` string COMMENT '订单号',
`user_id` string COMMENT '用户id' ,
`sku_id` string COMMENT '商品id',
`sku_name` string COMMENT '商品名称',
`order_price` string COMMENT '商品价格',
`sku_num` string COMMENT '商品数量',
`create_time` string COMMENT '创建时间'
) COMMENT '订单明细表'
PARTITIONED BY ( `dt` string)
row format delimited fields terminated by 't'
代码语言:javascript复制drop table if exists ods_sku_info;
create external table ods_sku_info(
`id` string COMMENT 'skuId',
`spu_id` string COMMENT 'spuid',
`price` decimal(10,2) COMMENT '价格' ,
`sku_name` string COMMENT '商品名称',
`sku_desc` string COMMENT '商品描述',
`weight` string COMMENT '重量',
`tm_id` string COMMENT '品牌id',
`category3_id` string COMMENT '品类id',
`create_time` string COMMENT '创建时间'
) COMMENT '商品表'
PARTITIONED BY ( `dt` string)
row format delimited fields terminated by 't' ;
代码语言:javascript复制drop table if exists ods_user_info;
create external table ods_user_info(
`id` string COMMENT '用户id',
`name` string COMMENT '姓名',
`birthday` string COMMENT '生日' ,
`gender` string COMMENT '性别',
`email` string COMMENT '邮箱',
`user_level` string COMMENT '用户等级',
`create_time` string COMMENT '创建时间'
) COMMENT '用户信息'
PARTITIONED BY ( `dt` string)
row format delimited fields terminated by 't' ;
代码语言:javascript复制drop table if exists ods_base_category1;
create external table ods_base_category1(
`id` string COMMENT 'id',
`name` string COMMENT '名称'
) COMMENT '商品一级分类'
PARTITIONED BY ( `dt` string)
row format delimited fields terminated by 't';
代码语言:javascript复制drop table if exists ods_base_category2;
create external table ods_base_category2(
`id` string COMMENT ' id',
`name` string COMMENT '名称',
category1_id string COMMENT '一级品类id'
) COMMENT '商品二级分类'
PARTITIONED BY ( `dt` string)
row format delimited fields terminated by 't'
代码语言:javascript复制drop table if exists ods_base_category3;
create external table ods_base_category3(
`id` string COMMENT ' id',
`name` string COMMENT '名称',
category2_id string COMMENT '二级品类id'
) COMMENT '商品三级分类'
PARTITIONED BY ( `dt` string)
row format delimited fields terminated by 't';
代码语言:javascript复制drop table if exists `ods_payment_info`;
create external table `ods_payment_info`(
`id` bigint COMMENT '编号',
`out_trade_no` string COMMENT '对外业务编号',
`order_id` string COMMENT '订单编号',
`user_id` string COMMENT '用户编号',
`alipay_trade_no` string COMMENT '支付宝交易流水编号',
`total_amount` decimal(16,2) COMMENT '支付金额',
`subject` string COMMENT '交易内容',
`payment_type` string COMMENT '支付类型',
`payment_time` string COMMENT '支付时间'
) COMMENT '支付流水表'
PARTITIONED BY ( `dt` string)
row format delimited fields terminated by 't' ;
ODS层数据导入脚本
新建脚本ods_db.sh
代码语言:javascript复制#!/bin/bash
APP=gmall
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" %F`
fi
sql="
load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table "$APP".ods_order_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table "$APP".ods_order_detail partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table "$APP".ods_sku_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table "$APP".ods_user_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table "$APP".ods_payment_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table "$APP".ods_base_category1 partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table "$APP".ods_base_category2 partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table "$APP".ods_base_category3 partition(dt='$do_date');
"
hive -e "$sql"
执行脚本导入数据
代码语言:javascript复制ods_db.sh 2020-11-24
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2
20/11/25 12:04:07 [main-EventThread]: ERROR imps.EnsembleTracker: Invalid config event received: {server.1=cdh2.macro.com:3181:4181:participant, version=0, server.3=cdh1.macro.com:3181:4181:participant, server.2=cdh3.macro.com:3181:4181:participant}
20/11/25 12:04:07 [main-EventThread]: ERROR imps.EnsembleTracker: Invalid config event received: {server.1=cdh2.macro.com:3181:4181:participant, version=0, server.3=cdh1.macro.com:3181:4181:participant, server.2=cdh3.macro.com:3181:4181:participant}
20/11/25 12:04:07 [main]: INFO jdbc.HiveConnection: Connected to cdh2.macro.com:10000
Connected to: Apache Hive (version 3.1.3000.7.1.3.0-100)
Driver: Hive JDBC (version 3.1.3000.7.1.3.0-100)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO : Compiling command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2): load data inpath '/origin_data/gmall/db/order_info/2020-11-24' OVERWRITE into table gmall.ods_order_info partition(dt='2020-11-24')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2); Time taken: 0.175 seconds
INFO : Executing command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2): load data inpath '/origin_data/gmall/db/order_info/2020-11-24' OVERWRITE into table gmall.ods_order_info partition(dt='2020-11-24')
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table gmall.ods_order_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/order_info/2020-11-24
INFO : Starting task [Stage-1:STATS] in serial mode
INFO : Completed executing command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2); Time taken: 0.462 seconds
INFO : OK
No rows affected (0.716 seconds)
INFO : Compiling command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256): load data inpath '/origin_data/gmall/db/order_detail/2020-11-24' OVERWRITE into table gmall.ods_order_detail partition(dt='2020-11-24')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256); Time taken: 0.16 seconds
INFO : Executing command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256): load data inpath '/origin_data/gmall/db/order_detail/2020-11-24' OVERWRITE into table gmall.ods_order_detail partition(dt='2020-11-24')
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table gmall.ods_order_detail partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/order_detail/2020-11-24
INFO : Starting task [Stage-1:STATS] in serial mode
INFO : Completed executing command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256); Time taken: 0.473 seconds
INFO : OK
No rows affected (0.655 seconds)
INFO : Compiling command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b): load data inpath '/origin_data/gmall/db/sku_info/2020-11-24' OVERWRITE into table gmall.ods_sku_info partition(dt='2020-11-24')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b); Time taken: 0.154 seconds
INFO : Executing command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b): load data inpath '/origin_data/gmall/db/sku_info/2020-11-24' OVERWRITE into table gmall.ods_sku_info partition(dt='2020-11-24')
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table gmall.ods_sku_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/sku_info/2020-11-24
INFO : Starting task [Stage-1:STATS] in serial mode
INFO : Completed executing command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b); Time taken: 0.418 seconds
INFO : OK
No rows affected (0.605 seconds)
INFO : Compiling command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512): load data inpath '/origin_data/gmall/db/user_info/2020-11-24' OVERWRITE into table gmall.ods_user_info partition(dt='2020-11-24')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512); Time taken: 0.154 seconds
INFO : Executing command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512): load data inpath '/origin_data/gmall/db/user_info/2020-11-24' OVERWRITE into table gmall.ods_user_info partition(dt='2020-11-24')
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table gmall.ods_user_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/user_info/2020-11-24
INFO : Starting task [Stage-1:STATS] in serial mode
INFO : Completed executing command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512); Time taken: 0.419 seconds
INFO : OK
No rows affected (0.597 seconds)
INFO : Compiling command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df): load data inpath '/origin_data/gmall/db/payment_info/2020-11-24' OVERWRITE into table gmall.ods_payment_info partition(dt='2020-11-24')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df); Time taken: 0.154 seconds
INFO : Executing command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df): load data inpath '/origin_data/gmall/db/payment_info/2020-11-24' OVERWRITE into table gmall.ods_payment_info partition(dt='2020-11-24')
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table gmall.ods_payment_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/payment_info/2020-11-24
INFO : Starting task [Stage-1:STATS] in serial mode
INFO : Completed executing command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df); Time taken: 0.441 seconds
INFO : OK
No rows affected (0.616 seconds)
INFO : Compiling command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb): load data inpath '/origin_data/gmall/db/base_category1/2020-11-24' OVERWRITE into table gmall.ods_base_category1 partition(dt='2020-11-24')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb); Time taken: 0.149 seconds
INFO : Executing command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb): load data inpath '/origin_data/gmall/db/base_category1/2020-11-24' OVERWRITE into table gmall.ods_base_category1 partition(dt='2020-11-24')
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table gmall.ods_base_category1 partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/base_category1/2020-11-24
INFO : Starting task [Stage-1:STATS] in serial mode
INFO : Completed executing command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb); Time taken: 0.413 seconds
INFO : OK
No rows affected (0.583 seconds)
INFO : Compiling command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91): load data inpath '/origin_data/gmall/db/base_category2/2020-11-24' OVERWRITE into table gmall.ods_base_category2 partition(dt='2020-11-24')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91); Time taken: 0.153 seconds
INFO : Executing command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91): load data inpath '/origin_data/gmall/db/base_category2/2020-11-24' OVERWRITE into table gmall.ods_base_category2 partition(dt='2020-11-24')
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table gmall.ods_base_category2 partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/base_category2/2020-11-24
INFO : Starting task [Stage-1:STATS] in serial mode
INFO : Completed executing command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91); Time taken: 0.411 seconds
INFO : OK
No rows affected (0.585 seconds)
INFO : Compiling command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d): load data inpath '/origin_data/gmall/db/base_category3/2020-11-24' OVERWRITE into table gmall.ods_base_category3 partition(dt='2020-11-24')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d); Time taken: 0.154 seconds
INFO : Executing command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d): load data inpath '/origin_data/gmall/db/base_category3/2020-11-24' OVERWRITE into table gmall.ods_base_category3 partition(dt='2020-11-24')
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table gmall.ods_base_category3 partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/base_category3/2020-11-24
INFO : Starting task [Stage-1:STATS] in serial mode
INFO : Completed executing command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d); Time taken: 0.425 seconds
INFO : OK
No rows affected (0.601 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2
DWD层
对ODS层数据进行判空过滤。对商品分类表进行维度退化(降维)
创建订单表
代码语言:javascript复制drop table if exists dwd_order_info;
create external table dwd_order_info (
`id` string COMMENT '',
`total_amount` decimal(10,2) COMMENT '',
`order_status` string COMMENT ' 1 2 3 4 5',
`user_id` string COMMENT 'id' ,
`payment_way` string COMMENT '',
`out_trade_no` string COMMENT '',
`create_time` string COMMENT '',
`operate_time` string COMMENT ''
)
PARTITIONED BY ( `dt` string)
stored as parquet;
订单详情表
代码语言:javascript复制drop table if exists dwd_order_detail;
create external table dwd_order_detail(
`id` string COMMENT '',
`order_id` decimal(10,2) COMMENT '',
`user_id` string COMMENT 'id' ,
`sku_id` string COMMENT 'id',
`sku_name` string COMMENT '',
`order_price` string COMMENT '',
`sku_num` string COMMENT '',
`create_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet;
用户表
代码语言:javascript复制drop table if exists dwd_user_info;
create external table dwd_user_info(
`id` string COMMENT 'id',
`name` string COMMENT '',
`birthday` string COMMENT '' ,
`gender` string COMMENT '',
`email` string COMMENT '',
`user_level` string COMMENT '',
`create_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet;
支付流水表
代码语言:javascript复制drop table if exists `dwd_payment_info`;
create external table `dwd_payment_info`(
`id` bigint COMMENT '',
`out_trade_no` string COMMENT '',
`order_id` string COMMENT '',
`user_id` string COMMENT '',
`alipay_trade_no` string COMMENT '',
`total_amount` decimal(16,2) COMMENT '',
`subject` string COMMENT '',
`payment_type` string COMMENT '',
`payment_time` string COMMENT ''
)
PARTITIONED BY ( `dt` string)
stored as parquet;
创建商品表(增加分类)
代码语言:javascript复制drop table if exists dwd_sku_info;
create external table dwd_sku_info(
`id` string COMMENT 'skuId',
`spu_id` string COMMENT 'spuid',
`price` decimal(10,2) COMMENT '' ,
`sku_name` string COMMENT '',
`sku_desc` string COMMENT '',
`weight` string COMMENT '',
`tm_id` string COMMENT 'id',
`category3_id` string COMMENT '1id',
`category2_id` string COMMENT '2id',
`category1_id` string COMMENT '3id',
`category3_name` string COMMENT '3',
`category2_name` string COMMENT '2',
`category1_name` string COMMENT '1',
`create_time` string COMMENT ''
)
PARTITIONED BY ( `dt` string)
stored as parquet;
DWD层数据导入脚本
创建脚本dwd_db.sh
代码语言:javascript复制#!/bin/bash
# 定义变量方便修改
APP=gmall
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" %F`
fi
sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table "$APP".dwd_order_info partition(dt)
select * from "$APP".ods_order_info
where dt='$do_date' and id is not null;
insert overwrite table "$APP".dwd_order_detail partition(dt)
select * from "$APP".ods_order_detail
where dt='$do_date' and id is not null;
insert overwrite table "$APP".dwd_user_info partition(dt)
select * from "$APP".ods_user_info
where dt='$do_date' and id is not null;
insert overwrite table "$APP".dwd_payment_info partition(dt)
select * from "$APP".ods_payment_info
where dt='$do_date' and id is not null;
insert overwrite table "$APP".dwd_sku_info partition(dt)
select
sku.id,
sku.spu_id,
sku.price,
sku.sku_name,
sku.sku_desc,
sku.weight,
sku.tm_id,
sku.category3_id,
c2.id category2_id ,
c1.id category1_id,
c3.name category3_name,
c2.name category2_name,
c1.name category1_name,
sku.create_time,
sku.dt
from
"$APP".ods_sku_info sku
join "$APP".ods_base_category3 c3 on sku.category3_id=c3.id
join "$APP".ods_base_category2 c2 on c3.category2_id=c2.id
join "$APP".ods_base_category1 c1 on c2.category1_id=c1.id
where sku.dt='$do_date' and c2.dt='$do_date'
and c3.dt='$do_date' and c1.dt='$do_date'
and sku.id is not null;
"
hive -e "$sql"
执行脚本 dwd_db.sh 2020-11-24
INFO : Status: DAG finished successfully in 9.01 seconds
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 1.16s
INFO : Prepare Plan 0.16s
INFO : Get Query Coordinator (AM) 0.01s
INFO : Submit Plan 0.10s
INFO : Start DAG 0.70s
INFO : Run DAG 9.01s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 506.00 380 0 143 125
INFO : Map 2 0.00 110 0 18 18
INFO : Map 3 4955.00 10,340 90 1,574 1
INFO : Map 5 0.00 510 0 300 300
INFO : Reducer 4 0.00 610 0 1 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 6
INFO : TOTAL_LAUNCHED_TASKS: 6
INFO : DATA_LOCAL_TASKS: 4
INFO : AM_CPU_MILLISECONDS: 2930
INFO : AM_GC_TIME_MILLIS: 0
INFO : File System Counters:
INFO : FILE_BYTES_READ: 4550
INFO : FILE_BYTES_WRITTEN: 36200
INFO : HDFS_BYTES_READ: 53605
INFO : HDFS_BYTES_WRITTEN: 54917
INFO : HDFS_READ_OPS: 13
INFO : HDFS_WRITE_OPS: 7
INFO : HDFS_OP_CREATE: 3
INFO : HDFS_OP_GET_FILE_STATUS: 9
INFO : HDFS_OP_MKDIRS: 2
INFO : HDFS_OP_OPEN: 4
INFO : HDFS_OP_RENAME: 2
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : REDUCE_INPUT_GROUPS: 1
INFO : REDUCE_INPUT_RECORDS: 1
INFO : COMBINE_INPUT_RECORDS: 0
INFO : SPILLED_RECORDS: 2
INFO : NUM_SHUFFLED_INPUTS: 4
INFO : NUM_SKIPPED_INPUTS: 1
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : MERGED_MAP_OUTPUTS: 1
INFO : GC_TIME_MILLIS: 90
INFO : TASK_DURATION_MILLIS: 5711
INFO : CPU_MILLISECONDS: 11950
INFO : PHYSICAL_MEMORY_BYTES: 4435476480
INFO : VIRTUAL_MEMORY_BYTES: 32940134400
INFO : COMMITTED_HEAP_BYTES: 4435476480
INFO : INPUT_RECORDS_PROCESSED: 2035
INFO : INPUT_SPLIT_LENGTH_BYTES: 53605
INFO : OUTPUT_RECORDS: 444
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 41244
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 42158
INFO : OUTPUT_BYTES_PHYSICAL: 32070
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 4050
INFO : ADDITIONAL_SPILLS_BYTES_READ: 4050
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_CHUNK_COUNT: 1
INFO : SHUFFLE_BYTES: 31998
INFO : SHUFFLE_BYTES_DECOMPRESSED: 42158
INFO : SHUFFLE_BYTES_TO_MEM: 31642
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 356
INFO : NUM_MEM_TO_DISK_MERGES: 0
INFO : NUM_DISK_TO_DISK_MERGES: 0
INFO : SHUFFLE_PHASE_TIME: 1149
INFO : MERGE_PHASE_TIME: 88
INFO : FIRST_EVENT_RECEIVED: 690
INFO : LAST_EVENT_RECEIVED: 690
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : HIVE:
INFO : CREATED_DYNAMIC_PARTITIONS: 1
INFO : CREATED_FILES: 2
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 125
INFO : RECORDS_IN_Map_2: 18
INFO : RECORDS_IN_Map_3: 1149
INFO : RECORDS_IN_Map_5: 300
INFO : RECORDS_OUT_0: 1
INFO : RECORDS_OUT_1_gmall.dwd_sku_info: 300
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 125
INFO : RECORDS_OUT_INTERMEDIATE_Map_2: 18
INFO : RECORDS_OUT_INTERMEDIATE_Map_3: 1
INFO : RECORDS_OUT_INTERMEDIATE_Map_5: 300
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_4: 0
INFO : RECORDS_OUT_OPERATOR_FIL_41: 1149
INFO : RECORDS_OUT_OPERATOR_FIL_49: 18
INFO : RECORDS_OUT_OPERATOR_FIL_52: 125
INFO : RECORDS_OUT_OPERATOR_FIL_56: 300
INFO : RECORDS_OUT_OPERATOR_FS_22: 300
INFO : RECORDS_OUT_OPERATOR_FS_29: 1
INFO : RECORDS_OUT_OPERATOR_GBY_25: 1
INFO : RECORDS_OUT_OPERATOR_GBY_27: 1
INFO : RECORDS_OUT_OPERATOR_MAPJOIN_44: 1149
INFO : RECORDS_OUT_OPERATOR_MAPJOIN_45: 300
INFO : RECORDS_OUT_OPERATOR_MAPJOIN_54: 125
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_26: 1
INFO : RECORDS_OUT_OPERATOR_RS_51: 18
INFO : RECORDS_OUT_OPERATOR_RS_55: 125
INFO : RECORDS_OUT_OPERATOR_RS_58: 300
INFO : RECORDS_OUT_OPERATOR_SEL_21: 300
INFO : RECORDS_OUT_OPERATOR_SEL_24: 300
INFO : RECORDS_OUT_OPERATOR_SEL_28: 1
INFO : RECORDS_OUT_OPERATOR_SEL_50: 18
INFO : RECORDS_OUT_OPERATOR_SEL_53: 125
INFO : RECORDS_OUT_OPERATOR_SEL_57: 300
INFO : RECORDS_OUT_OPERATOR_SEL_8: 1149
INFO : RECORDS_OUT_OPERATOR_TS_0: 125
INFO : RECORDS_OUT_OPERATOR_TS_3: 18
INFO : RECORDS_OUT_OPERATOR_TS_6: 1149
INFO : RECORDS_OUT_OPERATOR_TS_9: 300
INFO : TOTAL_TABLE_ROWS_WRITTEN: 300
INFO : Shuffle Errors:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : Shuffle Errors_Reducer_4_INPUT_Map_3:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : TaskCounter_Map_1_INPUT_Map_2:
INFO : FIRST_EVENT_RECEIVED: 9
INFO : INPUT_RECORDS_PROCESSED: 18
INFO : LAST_EVENT_RECEIVED: 9
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : SHUFFLE_BYTES: 356
INFO : SHUFFLE_BYTES_DECOMPRESSED: 348
INFO : SHUFFLE_BYTES_DISK_DIRECT: 356
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_PHASE_TIME: 50
INFO : TaskCounter_Map_1_INPUT_c2:
INFO : INPUT_RECORDS_PROCESSED: 125
INFO : INPUT_SPLIT_LENGTH_BYTES: 2181
INFO : TaskCounter_Map_1_OUTPUT_Map_3:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 4131
INFO : OUTPUT_BYTES_PHYSICAL: 2219
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 4387
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 125
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Map_2_INPUT_c1:
INFO : INPUT_RECORDS_PROCESSED: 18
INFO : INPUT_SPLIT_LENGTH_BYTES: 270
INFO : TaskCounter_Map_2_OUTPUT_Map_1:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 306
INFO : OUTPUT_BYTES_PHYSICAL: 380
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 348
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 18
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Map_3_INPUT_Map_1:
INFO : FIRST_EVENT_RECEIVED: 329
INFO : INPUT_RECORDS_PROCESSED: 125
INFO : LAST_EVENT_RECEIVED: 329
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : SHUFFLE_BYTES: 2195
INFO : SHUFFLE_BYTES_DECOMPRESSED: 4387
INFO : SHUFFLE_BYTES_DISK_DIRECT: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 2195
INFO : SHUFFLE_PHASE_TIME: 511
INFO : TaskCounter_Map_3_INPUT_Map_5:
INFO : FIRST_EVENT_RECEIVED: 329
INFO : INPUT_RECORDS_PROCESSED: 300
INFO : LAST_EVENT_RECEIVED: 329
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : SHUFFLE_BYTES: 25397
INFO : SHUFFLE_BYTES_DECOMPRESSED: 30468
INFO : SHUFFLE_BYTES_DISK_DIRECT: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 25397
INFO : SHUFFLE_PHASE_TIME: 514
INFO : TaskCounter_Map_3_INPUT_c3:
INFO : INPUT_RECORDS_PROCESSED: 1149
INFO : INPUT_SPLIT_LENGTH_BYTES: 21652
INFO : TaskCounter_Map_3_OUTPUT_Reducer_4:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : OUTPUT_BYTES: 6945
INFO : OUTPUT_BYTES_PHYSICAL: 4050
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 6955
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 1
INFO : SHUFFLE_CHUNK_COUNT: 1
INFO : SPILLED_RECORDS: 1
INFO : TaskCounter_Map_5_INPUT_sku:
INFO : INPUT_RECORDS_PROCESSED: 300
INFO : INPUT_SPLIT_LENGTH_BYTES: 29502
INFO : TaskCounter_Map_5_OUTPUT_Map_3:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 29862
INFO : OUTPUT_BYTES_PHYSICAL: 25421
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 30468
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 300
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Reducer_4_INPUT_Map_3:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 4050
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 4050
INFO : COMBINE_INPUT_RECORDS: 0
INFO : FIRST_EVENT_RECEIVED: 23
INFO : LAST_EVENT_RECEIVED: 23
INFO : MERGED_MAP_OUTPUTS: 1
INFO : MERGE_PHASE_TIME: 88
INFO : NUM_DISK_TO_DISK_MERGES: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_MEM_TO_DISK_MERGES: 0
INFO : NUM_SHUFFLED_INPUTS: 1
INFO : NUM_SKIPPED_INPUTS: 1
INFO : REDUCE_INPUT_GROUPS: 1
INFO : REDUCE_INPUT_RECORDS: 1
INFO : SHUFFLE_BYTES: 4050
INFO : SHUFFLE_BYTES_DECOMPRESSED: 6955
INFO : SHUFFLE_BYTES_DISK_DIRECT: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 4050
INFO : SHUFFLE_PHASE_TIME: 74
INFO : SPILLED_RECORDS: 1
INFO : TaskCounter_Reducer_4_OUTPUT_out_Reducer_4:
INFO : OUTPUT_RECORDS: 0
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 1
INFO : GROUPED_INPUT_SPLITS_Map_2: 1
INFO : GROUPED_INPUT_SPLITS_Map_3: 1
INFO : GROUPED_INPUT_SPLITS_Map_5: 1
INFO : INPUT_DIRECTORIES_Map_1: 1
INFO : INPUT_DIRECTORIES_Map_2: 1
INFO : INPUT_DIRECTORIES_Map_3: 1
INFO : INPUT_DIRECTORIES_Map_5: 1
INFO : INPUT_FILES_Map_1: 1
INFO : INPUT_FILES_Map_2: 1
INFO : INPUT_FILES_Map_3: 1
INFO : INPUT_FILES_Map_5: 1
INFO : RAW_INPUT_SPLITS_Map_1: 1
INFO : RAW_INPUT_SPLITS_Map_2: 1
INFO : RAW_INPUT_SPLITS_Map_3: 1
INFO : RAW_INPUT_SPLITS_Map_5: 1
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 2 .......... container SUCCEEDED 1 1 0 0 0 0
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Map 5 .......... container SUCCEEDED 1 1 0 0 0 0
Map 3 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 4 ...... container SUCCEEDED 2 2 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 05/05 [==========================>>] 100% ELAPSED TIME: 9.59 s
----------------------------------------------------------------------------------------------
300 rows affected (11.721 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2
DWS层之用户行为宽表
1)为什么要建宽表
需求目标,把每个用户单日的行为聚合起来组成一张多列宽表,以便之后关联用户维度信息后进行,不同角度的统计分析
创建用户行为宽表
代码语言:javascript复制drop table if exists dws_user_action;
create external table dws_user_action
(
user_id string comment '用户 id',
order_count bigint comment '下单次数 ',
order_amount decimal(16,2) comment '下单金额 ',
payment_count bigint comment '支付次数',
payment_amount decimal(16,2) comment '支付金额 '
) COMMENT '每日用户行为宽表'
PARTITIONED BY (`dt` string)
stored as parquet
tblproperties ("parquet.compression"="snappy");
用户行为数据宽表导入脚本
创建脚本dws_db_wide.sh
代码语言:javascript复制#!/bin/bash
# 定义变量方便修改
APP=gmall
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" %F`
fi
sql="
with
tmp_order as
(
select
user_id,
count(*) order_count,
sum(oi.total_amount) order_amount
from "$APP".dwd_order_info oi
where date_format(oi.create_time,'yyyy-MM-dd')='$do_date'
group by user_id
) ,
tmp_payment as
(
select
user_id,
sum(pi.total_amount) payment_amount,
count(*) payment_count
from "$APP".dwd_payment_info pi
where date_format(pi.payment_time,'yyyy-MM-dd')='$do_date'
group by user_id
)
insert overwrite table "$APP".dws_user_action partition(dt='$do_date')
select
user_actions.user_id,
sum(user_actions.order_count),
sum(user_actions.order_amount),
sum(user_actions.payment_count),
sum(user_actions.payment_amount)
from
(
select
user_id,
order_count,
order_amount,
0 payment_count,
0 payment_amount
from tmp_order
union all
select
user_id,
0 order_count,
0 order_amount,
payment_count,
payment_amount
from tmp_payment
) user_actions
group by user_id;
"
hive -e "$sql"
执行脚本
代码语言:javascript复制dws_db_wide.sh 2020-11-24
代码语言:javascript复制INFO : Status: Running (Executing on YARN cluster with App id application_1606125802436_0017)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 2 2 0 0 0 0
Reducer 4 ...... container SUCCEEDED 2 2 0 0 0 0
Reducer 5 ...... container SUCCEEDED 2 2 0 0 0 0
Map 6 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 7 ...... container SUCCEEDED 2 2 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 06/06 [==========================>>] 100% ELAPSED TIME: 11.15 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 10.07 seconds
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 0.72s
INFO : Prepare Plan 0.18s
INFO : Get Query Coordinator (AM) 0.01s
INFO : Submit Plan 4.30s
INFO : Start DAG 1.22s
INFO : Run DAG 10.07s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 4047.00 7,860 82 1,000 200
INFO : Map 6 4551.00 9,360 115 518 188
INFO : Reducer 2 1011.00 1,730 26 200 307
INFO : Reducer 4 1453.00 3,040 0 388 2
INFO : Reducer 5 443.00 830 8 2 0
INFO : Reducer 7 1011.00 1,970 14 188 188
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 2 2 0 0 0 0
Reducer 4 ...... container SUCCEEDED 2 2 0 0 0 0
Reducer 5 ...... container SUCCEEDED 2 2 0 0 0 0
Map 6 .......... container SUCCEEDED 1 1 0 0 0 0 .macro.com:8020/warehouse/tablespace/exReducer 7 ...... container SUCCEEDED 2 2 0 0 0 0 3248837236917871195-1/-ext-10000
----------------------------------------------------------------------------------------------
VERTICES: 06/06 [==========================>>] 100% ELAPSED TIME: 11.17 s
----------------------------------------------------------------------------------------------
200 rows affected (17.604 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2
查看hue,数据已经生成
ADS层(需求:GMV成交总额)
建表语句
代码语言:javascript复制drop table if exists ads_gmv_sum_day;
create external table ads_gmv_sum_day(
`dt` string COMMENT '统计日期',
`gmv_count` bigint COMMENT '当日gmv订单个数',
`gmv_amount` decimal(16,2) COMMENT '当日gmv订单总金额',
`gmv_payment` decimal(16,2) COMMENT '当日支付金额'
) COMMENT 'GMV'
row format delimited fields terminated by 't';
数据导入脚本
创建脚本ads_db_gmv.sh
代码语言:javascript复制#!/bin/bash
# 定义变量方便修改
APP=gmall
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" %F`
fi
sql="
insert into table "$APP".ads_gmv_sum_day
select
'$do_date' dt,
sum(order_count) gmv_count,
sum(order_amount) gmv_amount,
sum(payment_amount) payment_amount
from "$APP".dws_user_action
where dt ='$do_date'
group by dt;
"
hive -e "$sql"
执行脚本
代码语言:javascript复制ads_db_gmv.sh 2020-11-24
代码语言:javascript复制SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2
20/11/25 14:51:24 [main-EventThread]: ERROR imps.EnsembleTracker: Invalid config event received: {server.1=cdh2.macro.com:3181:4181:participant, version=0, server.3=cdh1.macro.com:3181:4181:participant, server.2=cdh3.macro.com:3181:4181:participant}
20/11/25 14:51:24 [main-EventThread]: ERROR imps.EnsembleTracker: Invalid config event received: {server.1=cdh2.macro.com:3181:4181:participant, version=0, server.3=cdh1.macro.com:3181:4181:participant, server.2=cdh3.macro.com:3181:4181:participant}
20/11/25 14:51:24 [main]: INFO jdbc.HiveConnection: Connected to cdh2.macro.com:10000
Connected to: Apache Hive (version 3.1.3000.7.1.3.0-100)
Driver: Hive JDBC (version 3.1.3000.7.1.3.0-100)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO : Compiling command(queryId=hive_20201125145125_efe11d96-baaa-451f-b5f8-d98403f2a949): insert into table gmall.ads_gmv_sum_day
select
'2020-11-24' dt,
sum(order_count) gmv_count,
sum(order_amount) gmv_amount,
sum(payment_amount) payment_amount
from gmall.dws_user_action
where dt ='2020-11-24'
group by dt
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:string, comment:null), FieldSchema(name:_col1, type:bigint, comment:null), FieldSchema(name:_col2, type:decimal(16,2), comment:null), FieldSchema(name:_col3, type:decimal(16,2), comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20201125145125_efe11d96-baaa-451f-b5f8-d98403f2a949); Time taken: 0.217 seconds
INFO : Executing command(queryId=hive_20201125145125_efe11d96-baaa-451f-b5f8-d98403f2a949): insert into table gmall.ads_gmv_sum_day
select
'2020-11-24' dt,
sum(order_count) gmv_count,
sum(order_amount) gmv_amount,
sum(payment_amount) payment_amount
from gmall.dws_user_action
where dt ='2020-11-24'
group by dt
INFO : Query ID = hive_20201125145125_efe11d96-baaa-451f-b5f8-d98403f2a949
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20201125145125_efe11d96-baaa-451f-b5f8-d98403f2a949
INFO : Session is already open
INFO : Dag name: insert into table gmall.ads_gmv_sum_day...dt (Stage-1)
INFO : Tez session was closed. Reopening...
INFO : Session re-established.
INFO : Session re-established.
INFO : Status: Running (Executing on YARN cluster with App id application_1606125802436_0018)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 2 2 0 0 0 0
Reducer 3 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 10.00 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 9.45 seconds
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 0.22s
INFO : Prepare Plan 0.06s
INFO : Get Query Coordinator (AM) 0.01s
INFO : Submit Plan 4.22s
INFO : Start DAG 1.31s
INFO : Run DAG 9.45s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 3527.00 7,280 102 200 1
INFO : Reducer 2 4397.00 7,540 98 1 2
INFO : Reducer 3 3389.00 630 0 2 0
INFO : ----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 2 2 0 0 0 0 tablespace/external/hive/gmall.db/ads_gReducer 3 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 10.01 s
----------------------------------------------------------------------------------------------
1 row affected (15.842 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2
数据导出脚本
在MySQL中创建ads_gmv_sum_day表
代码语言:javascript复制DROP TABLE IF EXISTS ads_gmv_sum_day;
CREATE TABLE ads_gmv_sum_day(
`dt` varchar(200) DEFAULT NULL COMMENT '统计日期',
`gmv_count` bigint(20) DEFAULT NULL COMMENT '当日gmv订单个数',
`gmv_amount` decimal(16, 2) DEFAULT NULL COMMENT '当日gmv订单总金额',
`gmv_payment` decimal(16, 2) DEFAULT NULL COMMENT '当日支付金额'
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '每日活跃用户数量' ROW_FORMAT = Dynamic;
创建脚本sqoop_export.sh
vim sqoop_export.sh
代码语言:javascript复制#!/bin/bash
export HADOOP_USER_NAME=hive
db_name=gmall
export_data() {
sqoop export
--connect "jdbc:mysql://192.168.0.208:3306/${db_name}?useUnicode=true&characterEncoding=utf-8"
--username root
--password password
--table $1
--num-mappers 1
--export-dir /warehouse/tablespace/external/hive/gmall.db/$1
--input-fields-terminated-by "t"
--update-mode allowinsert
--update-key $2
--input-null-string '\N'
--input-null-non-string '\N'
}
case $1 in
"ads_gmv_sum_day")
export_data "ads_gmv_sum_day" "dt"
;;
"all")
export_data "ads_gmv_sum_day" "dt"
;;
esac
执行脚本导入数据
代码语言:javascript复制sqoop_export.sh all
代码语言:javascript复制Warning: /opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/11/25 15:49:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.7.1.3.0-100
20/11/25 15:49:34 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/11/25 15:49:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/11/25 15:49:35 INFO tool.CodeGenTool: Beginning code generation
20/11/25 15:49:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `ads_gmv_sum_day` AS t LIMIT 1
20/11/25 15:49:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `ads_gmv_sum_day` AS t LIMIT 1
20/11/25 15:49:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
warning: No SupportedSourceVersion annotation found on org.apache.hadoop.hdds.conf.ConfigFileGenerator, returning RELEASE_6.
warning: Supported source version 'RELEASE_6' from annotation processor 'org.apache.hadoop.hdds.conf.ConfigFileGenerator' less than -source '1.8'
2 warnings
20/11/25 15:49:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f6653e4db17d3a1223b93c16acc9d822/ads_gmv_sum_day.jar
20/11/25 15:49:38 WARN manager.MySQLManager: MySQL Connector upsert functionality is using INSERT ON
20/11/25 15:49:38 WARN manager.MySQLManager: DUPLICATE KEY UPDATE clause that relies on table's unique key.
20/11/25 15:49:38 WARN manager.MySQLManager: Insert/update distinction is therefore independent on column
20/11/25 15:49:38 WARN manager.MySQLManager: names specified in --update-key parameter. Please see MySQL
20/11/25 15:49:38 WARN manager.MySQLManager: documentation for additional limitations.
20/11/25 15:49:38 INFO mapreduce.ExportJobBase: Beginning export of ads_gmv_sum_day
20/11/25 15:49:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
20/11/25 15:49:39 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
20/11/25 15:49:39 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
20/11/25 15:49:39 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
20/11/25 15:49:40 INFO client.RMProxy: Connecting to ResourceManager at cdh2.macro.com/192.168.0.207:8032
20/11/25 15:49:41 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hive/.staging/job_1606125802436_0019
20/11/25 15:49:45 INFO input.FileInputFormat: Total input files to process : 1
20/11/25 15:49:45 INFO input.FileInputFormat: Total input files to process : 1
20/11/25 15:49:45 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
20/11/25 15:49:45 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 19cb887b751ca6813bec478cf3bf20459deed68b]
20/11/25 15:49:45 INFO mapreduce.JobSubmitter: number of splits:1
20/11/25 15:49:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606125802436_0019
20/11/25 15:49:45 INFO mapreduce.JobSubmitter: Executing with tokens: []
20/11/25 15:49:45 INFO conf.Configuration: resource-types.xml not found
20/11/25 15:49:45 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
20/11/25 15:49:46 INFO impl.YarnClientImpl: Submitted application application_1606125802436_0019
20/11/25 15:49:46 INFO mapreduce.Job: The url to track the job: http://cdh2.macro.com:8088/proxy/application_1606125802436_0019/
20/11/25 15:49:46 INFO mapreduce.Job: Running job: job_1606125802436_0019
20/11/25 15:49:54 INFO mapreduce.Job: Job job_1606125802436_0019 running in uber mode : false
20/11/25 15:49:54 INFO mapreduce.Job: map 0% reduce 0%
20/11/25 15:50:02 INFO mapreduce.Job: map 100% reduce 0%
20/11/25 15:50:02 INFO mapreduce.Job: Job job_1606125802436_0019 completed successfully
20/11/25 15:50:02 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=253343
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=215
HDFS: Number of bytes written=0
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5709
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=5709
Total vcore-milliseconds taken by all map tasks=5709
Total megabyte-milliseconds taken by all map tasks=11692032
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=176
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=74
CPU time spent (ms)=1320
Physical memory (bytes) snapshot=285306880
Virtual memory (bytes) snapshot=2846998528
Total committed heap usage (bytes)=320864256
Peak Map Physical memory (bytes)=285306880
Peak Map Virtual memory (bytes)=2846998528
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
20/11/25 15:50:02 INFO mapreduce.ExportJobBase: Transferred 215 bytes in 22.3977 seconds (9.5992 bytes/sec)
20/11/25 15:50:02 INFO mapreduce.ExportJobBase: Exported 1 records.
查看mysql数据
代码语言:javascript复制MariaDB [gmall]> select * from ads_gmv_sum_day;
------------ ----------- ------------ -------------
| dt | gmv_count | gmv_amount | gmv_payment |
------------ ----------- ------------ -------------
| 2020-11-24 | 1000 | 527431.00 | 268152.00 |
------------ ----------- ------------ -------------
1 row in set (0.00 sec)