通过mapreduce清洗数据绑定到hive,再通过hive查询出结果集导入到hive的表,再通过sqoop导出到mysql
1.在hive中创建表
代码语言:javascript复制create external table mydb.access(ip string,day string,url string,upflow string) row format delimited fields terminated by ',';
2.加载清洗后的数据到刚创建的表
代码语言:javascript复制load data inpath '/hive/output/' into table mydb.access;
3.再创建一张表用于存放结果集
代码语言:javascript复制create external table mydb.upflow (ip string,sum string) row format delimited fields terminated by ',';
4.将查询结果存放到结果集表
代码语言:javascript复制insert into mydb.upflow select ip, sum(upflow) as sum from mydb.access group by ip order by sum desc;
5.在mysql中创建一张用于存放结果集的表
代码语言:javascript复制create table upflow (
ip varchar(200),
sum varchar(200)
);
6.通过sqoop将hive中的结果集导入mysql中的表
代码语言:javascript复制sqoop export --connect jdbc:mysql://localhost:3306/test --username root --password admin --table uv_info --export-dir /user/hive/warehouse/uv/dt=2011-08-03