版权声明:本文为博主原创文章,未经博主允许不得转载。 https://cloud.tencent.com/developer/article/1433134
十七、分段维度
代码语言:txt复制 本节说明分段维度的实现技术。分段维度包含连续值的分段。例如,年度销售订单分段维度可能包含有叫做“低”、“中”、“高”的三档;各档定义分别为0.01到15000、15000.01到30000.00、30000.01到99999999.99。如果一个客户的年度销售订单金额为10000,则被归为“低”档。
代码语言:txt复制 分段维度可以存储多个分段集合。例如,可能有一个用于促销分析的分段集合,另一个用于市场细分,可能还有一个用于销售区域计划。分段一般由用户定义,而且很少能从交易源数据直接获得。
1. 年度销售订单星型模式
代码语言:txt复制 本小节说明如何实现一个年度订单分段维度。需要两个新的星型模式,如下图所示。星型模式的事实表使用(关联到)已有的customer_dim和一个新的year_dim表。年维度是日期维度的子集。annual_customer_segment_fact是唯一用到annual_order_segment_dim表的表。annual_order_segement_dim是分段维度表。
代码语言:txt复制 annual_order_segment_dim表存储多个分段集合。在下面的例子里将两个分段集合“project alpha”和“grid”导入annual_order_segment_dim表。这两种分段集合都是按照用户的年度销售订单金额将其分类。project alpha分六段,grid分三段。下表显示了这个分段的例子。
Segment Name | Band Name | Start Value | End Value |
---|---|---|---|
PROJECT ALPHA | Bottom | 0.01 | 2500.00 |
PROJECT ALPHA | Low | 2500.01 | 3000.00 |
PROJECT ALPHA | Mid-low | 3000.01 | 4000.00 |
PROJECT ALPHA | Mid | 4000.01 | 5500.00 |
PROJECT ALPHA | Mid-high | 5500.01 | 6500.00 |
PROJECT ALPHA | Top | 6500.01 | 99999999.99 |
Grid | LOW | 0.01 | 3000.00 |
Grid | MED | 3000.01 | 6000.00 |
Grid | HIGH | 6000.01 | 99999999.99 |
每一分段有一个开始值和一个结束值。 分段的粒度就是本段和下段之间的间隙。粒度必须是度量的最小可能值,在销售订单金额的示例中是0.01。最后一个分段的结束值是销售订单金额可能的最大值。下面的脚本用于建立分段维度数据仓库模式。
代码语言:javascript复制use dw;
create table annual_order_segment_dim (
segment_sk int,
segment_name varchar(30),
band_name varchar(50),
band_start_amount decimal(10,2),
band_end_amount decimal(10,2),
version int,
effective_date date,
expiry_date date
)
clustered by (segment_sk) into 8 buckets
stored as orc tblproperties ('transactional'='true');
insert into annual_order_segment_dim values (1, 'project alpha', 'bottom', 0.01, 2500.00, 1, '1900-01-01', '2200-01-01');
insert into annual_order_segment_dim values (2, 'project alpha', 'low', 2500.01, 3000.00, 1, '1900-01-01', '2200-01-01');
insert into annual_order_segment_dim values (3, 'project alpha', 'mid-low', 3000.01, 4000.00, 1, '1900-01-01', '2200-01-01');
insert into annual_order_segment_dim values (4, 'project alpha', 'mid', 4000.01, 5500.00, 1, '1900-01-01', '2200-01-01');
insert into annual_order_segment_dim values (5, 'project alpha', 'mid_high', 5500.01, 6500.00, 1, '1900-01-01', '2200-01-01');
insert into annual_order_segment_dim values (6, 'project alpha', 'top', 6500.01, 99999999.99, 1, ' 1900-01-01', '2200-01-01');
insert into annual_order_segment_dim values (7, 'grid', 'low', 0.01, 3000, 1, '1900-01-01', '2200-01-01');
insert into annual_order_segment_dim values (8, 'grid', 'med', 3000.01, 6000.00, 1, ' 1900-01-01', '2200-01-01');
insert into annual_order_segment_dim values (9, 'grid', 'high', 6000.01, 99999999.99, 1, '1900-01-01', '2200-01-01');
create table year_dim (
year_sk int,
year int
);
create table annual_sales_order_fact (
customer_sk int,
year_sk int,
annual_order_amount decimal(10, 2)
);
create table annual_customer_segment_fact (
segment_sk int,
customer_sk int,
year_sk int
);
2. 初始装载
代码语言:txt复制 本小节说明初始装载并进行测试。下面的初始装载脚本将order_date维度表(date_dim表的一个视图)里的数据导入year_dim表,将sales_order_fact表里的数据导入annual_sales_order_fact表,将annual_sales_order_fact表里的数据导入annual_customer_segment_fact表。此脚本装载所有历史数据。
代码语言:javascript复制use dw;
insert into year_dim
select row_number() over (order by t1.year) t2.sk_max, year
from (select distinct year year from order_date_dim) t1
cross join (select coalesce(max(year_sk),0) sk_max from year_dim) t2;
insert into annual_sales_order_fact
select a.customer_sk,
year_sk,
sum(order_amount)
from sales_order_fact a,
year_dim c,
order_date_dim d
where a.order_date_sk = d.order_date_sk
and c.year = d.year
and d.year < 2017
group by a.customer_sk, c.year_sk;
insert into annual_customer_segment_fact
select d.segment_sk,
a.customer_sk,
a.year_sk
from annual_sales_order_fact a,
annual_order_segment_dim d
where annual_order_amount >= band_start_amount
and annual_order_amount <= band_end_amount;
代码语言:txt复制 执行初始装载脚本,查询annual_customer_segment_fact表确认初始装载是成功的。
代码语言:javascript复制select a.customer_sk csk,
a.year_sk ysk,
annual_order_amount amt,
segment_name sn,
band_name bn
from annual_customer_segment_fact a,
annual_order_segment_dim b,
year_dim c,
annual_sales_order_fact d
where a.segment_sk = b.segment_sk
and a.year_sk = c.year_sk
and a.customer_sk = d.customer_sk
and a.year_sk = d.year_sk
cluster by csk, ysk, sn, bn;
代码语言:txt复制 查询结果如下图所示。
代码语言:txt复制 注意,这里是按客户代理键customer_sk分组求和来判断分段,实际情况可能是以customer_number进行分组的,因为无论客户的scd属性如何变化,一般还是认为是一个客户。
3. 定期装载
代码语言:txt复制 本小节说明定期装载脚本和如何测试它。除了无需装载year_dim表以外,定期装载与初始装载类似。annual_sales_order_fact表里的数据被导入annual_customer_segment_fact表。每年调度执行下面的定期装载脚本,此脚本装载前一年的销售数据。
代码语言:javascript复制use dw;
insert into annual_sales_order_fact
select a.customer_sk,
year_sk,
sum(order_amount)
from sales_order_fact a,
year_dim c,
order_date_dim d
where a.order_date_sk = d.order_date_sk
and c.year = d.year
and d.year = year(current_date) - 1
group by a.customer_sk, c.year_sk;
insert into annual_customer_segment_fact
select d.segment_sk,
a.customer_sk,
c.year_sk
from annual_sales_order_fact a,
year_dim c,
annual_order_segment_dim d
where a.year_sk = c.year_sk
and c.year = year(current_date) - 1
and annual_order_amount >= band_start_amount
and annual_order_amount <= band_end_amount;