(五)进阶技术 11. 间接数据源 本篇讨论如何处理间接数据源。间接数据源与维度表具有不同的粒度,因此不能直接装载进数据仓库。在这里通过修改(五)进阶技术7. “多路径和参差不齐的层次”里的推广源数据说明怎样处理间接数据源。 CAMPAIGN SESSION,MONTH,YEAR 2014 First Campaign,1,2014 2014 First Campaign,2,2014 2014 First Campaign,3,2014 2014 First Campaign,4,2014 2014 Second Campaign,5,2014 2014 Second Campaign,6,2014 2014 Second Campaign,7,2014 2014 Third Campaign,8,2014 2014 Last Campaign,9,2014 2014 Last Campaign,10,2014 2014 Last Campaign,11,2014 2014 Last Campaign,12,2014 如上所示,推广期数据源的粒度是月,因为每行都有一个月份元素。而且一个推广期可能延续多个月,正如上面显示的2014年第一个推广期有四个月。这意味着推广期信息重复了四次,也就是四行。比方说希望简化推广期源数据的准备工作,每个推广期不管有多长,只准备一行数据。新的数据格式可以改成下面所示。 CAMPAIGN_SESSION, START_MONTH, START_YEAR, END_MONTH, END_YEAR 2014 First Campaign, 1, 2014, 4, 2014 2014 Second Campaign, 5, 2014, 7, 2014 2014 Third Campaign, 8, 2014, 8, 2014 2014 Last Campaign, 9, 2014, 12, 2014 修改推广导入脚本 需要一个不同的过渡表。使用清单(五)-11-1里的脚本创建它。
代码语言:javascript复制USE dw;
CREATE TABLE non_straight_campaign_stg (
campaign_session CHAR(30),
start_month CHAR(9),
start_year INT(4),
end_month CHAR(9),
end_year INT(4)
);
清单(五)-11-1
注意新的过渡表既有开始年月列也有结束年月列。 清单清单(五)-11-2给出了修改后的推广期导入脚本。
代码语言:javascript复制USE dw;
TRUNCATE non_straight_campaign_stg;
LOAD DATA INFILE '/root/data-integration/non_straight_campaign.csv'
INTO TABLE non_straight_campaign_stg
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY 'n'
IGNORE 1 LINES
(
campaign_session
, start_month
, start_year
, end_month
, end_year
);
-- 更新月维度的推广期,假设推广期中的月份没有跨年的情况
update month_dim p,
month_dim q,
(select
campaign_session,
start_year,
start_month,
end_year,
end_month
from
non_straight_campaign_stg) r
set
p.campaign_session = r.campaign_session
where
p.year = r.start_year
and p.month >= r.start_month
and q.year = r.end_year
and q.month <= r.end_month
and p.year = q.year
and p.month = q.month;
COMMIT ;
清单(五)-11-2
图(五)- 11-1到图(五)- 11-6显示了相应的Kettle转换。
图(五)- 11-1
图(五)- 11-2
图(五)- 11-3
图(五)- 11-4
图(五)- 11-5
图(五)- 11-6
现在测试新脚本。推广期数据在/root/data-integration/non_straight_campaign.csv文件中,如下所示。 CAMPAIGN_SESSION,START_MONTH,START_YEAR,END_MONTH,END_YEAR 2014 First Campaign,1,2014,4,2014 2014 Second Campaign,5,2014,7,2014 2014 Third Campaign,8,2014,8,2014 2014 Last Campaign,9,2014,12,2014 执行修改后的推广期装载脚本或相应的Kettle转换之前,要执行下面的命令删除已装载的推广期数据(假设推广期中的月份没有跨年的情况)。 USE dw; UPDATE month_dim SET campaign_session = NULL; COMMIT; 现在查询month_dim表,确认它被正确地装载,查询语句和结果应该如下所示。 mysql> select month_sk m_sk, month_name, month m, campaign_session,quarter q, year -> from month_dim -> where year = 2014; ------ ------------ ------ ---------------------- ------ ------ | m_sk | month_name | m | campaign_session | q | year | ------ ------------ ------ ---------------------- ------ ------ | 169 | January | 1 | 2014 First Campaign | 1 | 2014 | | 170 | February | 2 | 2014 First Campaign | 1 | 2014 | | 171 | March | 3 | 2014 First Campaign | 1 | 2014 | | 172 | April | 4 | 2014 First Campaign | 2 | 2014 | | 173 | May | 5 | 2014 Second Campaign | 2 | 2014 | | 174 | June | 6 | 2014 Second Campaign | 2 | 2014 | | 175 | July | 7 | 2014 Second Campaign | 3 | 2014 | | 176 | August | 8 | 2014 Third Campaign | 3 | 2014 | | 177 | September | 9 | 2014 Last Campaign | 3 | 2014 | | 178 | October | 10 | 2014 Last Campaign | 4 | 2014 | | 179 | November | 11 | 2014 Last Campaign | 4 | 2014 | | 180 | December | 12 | 2014 Last Campaign | 4 | 2014 | ------ ------------ ------ ---------------------- ------ ------ 12 rows in set (0.00 sec)