(五)进阶技术 16. 分段维度 本篇说明分段维度的实现技术。分段维度包含连续值的分段。例如,年度销售订单分段维度可能包含有叫做“低”、“中”、“高”的三档;各档定义分别为0.01到15000、15000.01到30000.00、30000.01到99999999.99。如果一个客户的年度销售订单金额为10000,则被归为“低”档。 分段维度可以存储多个分段集合。例如,可能有一个用于促销分析的分段集合,另一个用于市场细分,可能还有一个用于销售区域计划。分段一般由用户定义,而且很少能从交易源数据直接获得。本篇要使用(五)进阶技术 10. 多重星型模式的开发经验实现分段维度。 年度销售订单星型模式 本节说明如何实现一个年度订单分段维度。你需要两个新的星型模式,如图(五)- 16-1所示。星型模式的事实表使用(关联到)已有的customer_dim和一个新的year_dim表。年维度是日期维度的子集。annual_customer_segment_fact是唯一用到annual_order_segment_dim表的表。annual_order_segement_dim是分段维度。
图(五)- 16-1
annual_order_segment_dim表存储多个分段集合。在下面的例子里将两个分段集合“PROJECT ALPHA”和“Grid”导入annual_order_segment_dim表。这两种分段集合都是按照用户的年度销售订单金额将其分类。PROJECT ALPHA分六段,Grid分三段。表(五)- 16-1显示了这个分段的例子。
Segment Name | Band Name | Start Value | End Value |
---|---|---|---|
PROJECT ALPHA | Bottom | 0.01 | 2500.00 |
PROJECT ALPHA | Low | 2500.01 | 3000.00 |
PROJECT ALPHA | Mid-low | 3000.01 | 4000.00 |
PROJECT ALPHA | Mid | 4000.01 | 5500.00 |
PROJECT ALPHA | Mid-high | 5500.01 | 6500.00 |
PROJECT ALPHA | Top | 6500.01 | 99999999.99 |
Grid | LOW | 0.01 | 3000.00 |
Grid | MED | 3000.01 | 6000.00 |
Grid | HIGH | 6000.01 | 99999999.99 |
表(五)- 16-1
每一分段有一个开始值和一个结束值。 分段的粒度就是本段和下段之间的间隙。粒度必须是度量的最小可能值,在销售订单金额的示例中是0.01。最后一个分段的结束值是销售订单金额可能的最大值。 清单(五)-16-1里的脚本用于建立分段维度数据仓库模式。
代码语言:javascript复制USE dw;
CREATE TABLE annual_order_segment_dim (
segment_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
segment_name VARCHAR(30),
band_name VARCHAR(50),
band_start_amount DEC(10 , 2 ),
band_end_amount DEC(10 , 2 ),
effective_date DATE,
expiry_date DATE
);
INSERT INTO annual_order_segment_dim VALUES
(NULL, 'PROJECT ALPHA', 'Bottom', 0.01, 2500.00, '1900-01-01',
'2200-01-01')
, (NULL, 'PROJECT ALPHA', 'Low', 2500.01, 3000.00, '1900-01-01',
'2200-01-01')
, (NULL, 'PROJECT ALPHA', 'Mid-Low', 3000.01, 4000.00, '1900-01-01',
'2200-01-01')
, (NULL, 'PROJECT ALPHA', 'Mid', 4000.01, 5500.00, '1900-01-01',
'2200-01-01')
, (NULL, 'PROJECT ALPHA', 'Mid_High', 5500.01, 6500.00, '1900-01-01',
'2200-01-01')
, (NULL, 'PROJECT ALPHA', 'Top', 6500.01, 99999999.99, ' 1900-01-01',
'2200-01-01')
, (NULL, 'Grid', 'LOW', 0.01, 3000, '1900-01-01', '2200-01-01')
, (NULL, 'Grid', 'MED', 3000.01, 6000.00, ' 1900-01-01', '2200-01-01')
, (NULL, 'Grid', 'HIGH', 6000.01, 99999999.99, '1900-01-01', '2200-01-01');
commit;
CREATE TABLE year_dim (
year_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
year INT(4),
effective_date DATE,
expiry_date DATE
);
CREATE TABLE annual_sales_order_fact (
customer_sk INT,
year_sk INT,
annual_order_amount DEC(10 , 2 )
);
alter table annual_sales_order_fact
add foreign key (customer_sk) references customer_dim(customer_sk),
add foreign key (year_sk) references year_dim(year_sk);
CREATE TABLE annual_customer_segment_fact (
segment_sk INT,
customer_sk INT,
year_sk INT
);
alter table annual_customer_segment_fact
add foreign key (segment_sk) references annual_order_segment_dim(segment_sk),
add foreign key (customer_sk) references customer_dim(customer_sk),
add foreign key (year_sk) references year_dim(year_sk);
清单(五)-16-1
初始装载 本节说明初始装载并进行测试。清单(五)-16-2里的初始装载脚本将order_date维度表(date_dim表的一个视图)里的数据导入year_dim表,将sales_order_fact表里的数据导入annual_sales_order_fact表,将annual_sales_order_fact表里的数据导入annual_customer_segment_fact表。此脚本装载所有历史数据。
代码语言:javascript复制use dw;
INSERT INTO year_dim
SELECT DISTINCT
NULL
, year
, effective_date
, expiry_date
FROM order_date_dim;
INSERT INTO annual_sales_order_fact
SELECT
a.customer_sk
, year_sk
, SUM(order_amount)
FROM
sales_order_fact a
, year_dim c
, order_date_dim d
WHERE
a.order_date_sk = d.order_date_sk
AND c.year = d.year
AND d.year < YEAR(CURRENT_DATE)
GROUP BY a.customer_sk, c.year_sk;
INSERT INTO annual_customer_segment_fact
SELECT
d.segment_sk
, a.customer_sk
, a.year_sk
FROM
annual_sales_order_fact a
, annual_order_segment_dim d
WHERE
annual_order_amount >= band_start_amount
AND annual_order_amount <= band_end_amount;
commit;
清单(五)-16-2
为了测试初始装载脚本,先设置系统日期设置为2014年的任何日期以装载2013年的数据。在后面的定期测试小结里将导入2014年的销售订单。 执行完清单(五)-16-2里的脚本,查询annual_customer_segment_fact表确认初始装载是成功的。查询语句和结果如下所示。 mysql> select -> a.customer_sk csk, -> a.year_sk ysk, -> annual_order_amount amt, -> segment_name sn, -> band_name bn -> from -> annual_customer_segment_fact a, -> annual_order_segment_dim b, -> year_dim c, -> annual_sales_order_fact d -> where -> a.segment_sk = b.segment_sk -> AND a.year_sk = c.year_sk -> AND a.customer_sk = d.customer_sk -> AND a.year_sk = d.year_sk -> order BY a.customer_sk , year , segment_name , band_name; ------ ------ --------- --------------- ---------- | csk | ysk | amt | sn | bn | ------ ------ --------- --------------- ---------- | 1 | 14 | 8000.00 | Grid | HIGH | | 1 | 14 | 8000.00 | PROJECT ALPHA | Top | | 3 | 14 | 4000.00 | Grid | MED | | 3 | 14 | 4000.00 | PROJECT ALPHA | Mid-Low | | 4 | 14 | 4000.00 | Grid | MED | | 4 | 14 | 4000.00 | PROJECT ALPHA | Mid-Low | | 5 | 14 | 6000.00 | Grid | MED | | 5 | 14 | 6000.00 | PROJECT ALPHA | Mid_High | | 6 | 14 | 6000.00 | Grid | MED | | 6 | 14 | 6000.00 | PROJECT ALPHA | Mid_High | | 7 | 14 | 8000.00 | Grid | HIGH | | 7 | 14 | 8000.00 | PROJECT ALPHA | Top | ------ ------ --------- --------------- ---------- 12 rows in set (0.01 sec) 查询结果表明每个在2013年有订单的客户都被赋予了两个分段集合中的值。可以验证年度销售金额分段赋值是否正确。 定期装载 本节说明定期装载脚本和如何测试它。除了无需装载year_dim表以外,定期装载与初始装载类似。annual_sales_order_fact表里的数据被导入annual_customer_segment_fact表。 每年调度执行清单(五)-16-3里的定期装载,此脚本装载前一年的销售数据。
代码语言:javascript复制use dw;
INSERT INTO annual_sales_order_fact
SELECT
a.customer_sk
, year_sk
, SUM(order_amount)
FROM
sales_order_fact a
, year_dim c
, order_date_dim d
WHERE
a.order_date_sk = d.order_date_sk
AND c.year = d.year
AND c.year = YEAR(CURRENT_DATE) - 1
GROUP BY a.customer_sk, c.year_sk;
INSERT INTO annual_customer_segment_fact
SELECT
d.segment_sk
, a.customer_sk
, c.year_sk
FROM
annual_sales_order_fact a
, year_dim c
, annual_order_segment_dim d
WHERE
a.year_sk = c.year_sk
AND c.year = YEAR(CURRENT_DATE) - 1
AND annual_order_amount >= band_start_amount
AND annual_order_amount <= band_end_amount;
commit;
清单(五)-16-3
使用Kettle转换进行定期装载的步骤如图(五)- 16-2到图(五)- 16-14所示。
图(五)- 16-2
图(五)- 16-3
图(五)- 16-4
图(五)- 16-5
图(五)- 16-6
图(五)- 16-7
图(五)- 16-8
图(五)- 16-9
图(五)- 16-10
图(五)- 16-11
图(五)- 16-12
图(五)- 16-13
图(五)- 16-14
测试
为了测试定期,设置系统日期为2015年的日期并执行清单(五)-16-3里的脚本会对应的Kettle转换。 查询customer_order_segment_fact表确认定期装载是否正确。查询语句和结果如下所示。 mysql> select -> a.customer_sk csk, -> a.year_sk ysk, -> annual_order_amount amt, -> segment_name sn, -> band_name bn -> from -> annual_customer_segment_fact a, -> annual_order_segment_dim b, -> year_dim c, -> annual_sales_order_fact d -> where -> a.segment_sk = b.segment_sk -> AND a.year_sk = c.year_sk -> AND a.customer_sk = d.customer_sk -> AND a.year_sk = d.year_sk -> order BY a.customer_sk , year , segment_name , band_name; ------ ------ --------- --------------- ---------- | csk | ysk | amt | sn | bn | ------ ------ --------- --------------- ---------- | 1 | 14 | 8000.00 | Grid | HIGH | | 1 | 14 | 8000.00 | PROJECT ALPHA | Top | | 1 | 15 | 4000.00 | Grid | MED | | 1 | 15 | 4000.00 | PROJECT ALPHA | Mid-Low | | 2 | 15 | 5500.00 | Grid | MED | | 2 | 15 | 5500.00 | PROJECT ALPHA | Mid | | 3 | 14 | 4000.00 | Grid | MED | | 3 | 14 | 4000.00 | PROJECT ALPHA | Mid-Low | | 3 | 15 | 2000.00 | Grid | LOW | | 3 | 15 | 2000.00 | PROJECT ALPHA | Bottom | | 4 | 14 | 4000.00 | Grid | MED | | 4 | 14 | 4000.00 | PROJECT ALPHA | Mid-Low | | 4 | 15 | 3000.00 | Grid | LOW | | 4 | 15 | 3000.00 | PROJECT ALPHA | Low | | 5 | 14 | 6000.00 | Grid | MED | | 5 | 14 | 6000.00 | PROJECT ALPHA | Mid_High | | 5 | 15 | 2500.00 | Grid | LOW | | 5 | 15 | 2500.00 | PROJECT ALPHA | Bottom | | 6 | 14 | 6000.00 | Grid | MED | | 6 | 14 | 6000.00 | PROJECT ALPHA | Mid_High | | 6 | 15 | 3000.00 | Grid | LOW | | 6 | 15 | 3000.00 | PROJECT ALPHA | Low | | 7 | 14 | 8000.00 | Grid | HIGH | | 7 | 14 | 8000.00 | PROJECT ALPHA | Top | | 7 | 15 | 3500.00 | Grid | MED | | 7 | 15 | 3500.00 | PROJECT ALPHA | Mid-Low | ------ ------ --------- --------------- ---------- 26 rows in set (0.00 sec)