一、题目
支出表: t2_spending
------------- ---------
| Column Name | Type |
------------- ---------
| user_id | int |
| spend_date | string |
| platform | string |
| amount | int |
------------- ---------
- 这张表记录了用户在一个在线购物网站的支出历史,该在线购物平台同时拥有桌面端('desktop')和手机端('mobile')的应用程序。
- 这张表的主键是 (user_id, spend_date, platform)。
写一段 SQL 来查找每天 仅 使用手机端用户、仅 使用桌面端用户和 同时 使用桌面端和手机端的用户人数和总支出金额。
查询结果格式如下例所示: t2_spending table:
代码语言:javascript复制 --------- ------------ ---------- --------
| user_id | spend_date | platform | amount |
--------- ------------ ---------- --------
| 1 | 2019-07-01 | mobile | 100 |
| 1 | 2019-07-01 | desktop | 100 |
| 2 | 2019-07-01 | mobile | 100 |
| 2 | 2019-07-02 | mobile | 100 |
| 3 | 2019-07-01 | desktop | 100 |
| 3 | 2019-07-02 | desktop | 100 |
--------- ------------ ---------- --------
Result table:
代码语言:javascript复制 ------------ ---------- -------------- -------------
| spend_date | platform | total_amount | total_users |
------------ ---------- -------------- -------------
| 2019-07-01 | desktop | 100 | 1 |
| 2019-07-01 | mobile | 100 | 1 |
| 2019-07-01 | both | 200 | 1 |
| 2019-07-02 | desktop | 100 | 1 |
| 2019-07-02 | mobile | 100 | 1 |
| 2019-07-02 | both | 0 | 0 |
------------ ---------- -------------- -------------
在 2019-07-01, 用户1 同时 使用桌面端和手机端购买, 用户2 仅 使用了手机端购买,而用户3 仅 使用了桌面端购买。 在 2019-07-02, 用户2 仅 使用了手机端购买, 用户3 仅 使用了桌面端购买,且没有用户 同时 使用桌面端和手机端购买。
二、分析
本题难点在于:1。对于仅使用一个平台的列出其平台,而对于使用两个平台的需要标记为both;2.对于07-02没有both的,需要进行显示为0的处理。
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️ |
三、SQL
1.区分desktop、mobile、both
由于spark不支持count(distinct )开窗,所以我们先计算一下每个用户每天使用的平台个数。 然后原表与新表关联,计算出每个用户每天使用的new_platform
平台类型。
执行SQL
代码语言:javascript复制select t1.spend_date,
t1.user_id,
t1.platform,
t1.amount,
t2.plat_cnt,
if(t2.plat_cnt = 1, t1.platform, 'both') as new_platform
from t2_spending t1
left join
(select spend_date,
user_id,
count(distinct platform) as plat_cnt
from t2_spending
group by spend_date,
user_id) t2
on t1.spend_date = t2.spend_date
and t1.user_id = t2.user_id
SQL结果
代码语言:javascript复制 ------------- ---------- ----------- --------- ----------- ---------------
| spend_date | user_id | platform | amount | plat_cnt | new_platform |
------------- ---------- ----------- --------- ----------- ---------------
| 2019-01-01 | 1 | mobile | 100 | 2 | both |
| 2019-01-01 | 1 | desktop | 100 | 2 | both |
| 2019-01-01 | 2 | mobile | 100 | 1 | mobile |
| 2019-01-02 | 2 | mobile | 100 | 1 | mobile |
| 2019-01-01 | 3 | desktop | 100 | 1 | desktop |
| 2019-01-02 | 3 | desktop | 100 | 1 | desktop |
------------- ---------- ----------- --------- ----------- ---------------
2.使用new_plateform 计算各项指标
我们使用新的new_platform来计算各项结果
执行SQL
代码语言:javascript复制select spend_date,
new_platform,
sum(amount) as total_amount,
count(distinct user_id) as total_users
from (select t1.spend_date,
t1.user_id,
t1.platform,
t1.amount,
t2.plat_cnt,
if(t2.plat_cnt = 1, t1.platform, 'both') as new_platform
from t2_spending t1
left join
(select spend_date,
user_id,
count(distinct platform) as plat_cnt
from t2_spending
group by spend_date,
user_id) t2
on t1.spend_date = t2.spend_date
and t1.user_id = t2.user_id) tt
group by spend_date,
new_platform
order by 1 asc
SQL结果
代码语言:javascript复制 ------------- --------------- --------------- --------------
| spend_date | new_platform | total_amount | total_users |
------------- --------------- --------------- --------------
| 2019-01-01 | mobile | 100 | 1 |
| 2019-01-01 | desktop | 100 | 1 |
| 2019-01-01 | both | 200 | 1 |
| 2019-01-02 | desktop | 100 | 1 |
| 2019-01-02 | mobile | 100 | 1 |
------------- --------------- --------------- --------------
3.补充维表得到最终结果
需要每天展示不同平台的全部数据,所以我们先创建一张包含mobile
,desktop
,both
类型与全量日期组合的数据表
执行SQL
代码语言:javascript复制select ttt1.spend_date,
ttt1.platform,
coalesce(total_amount, 0) as total_amount,
coalesce(total_users, 0) as total_users
from (
--生成日期 平台类型的全量的数据
select spend_date, 'desktop' as platform
from t2_spending
group by spend_date
union all
select spend_date, 'mobile' as platform
from t2_spending
group by spend_date
union all
select spend_date, 'both' as platform
from t2_spending
group by spend_date) ttt1
left join
(select spend_date,
new_platform,
sum(amount) as total_amount,
count(distinct user_id) as total_users
from (select t1.spend_date,
t1.user_id,
t1.platform,
t1.amount,
t2.plat_cnt,
if(t2.plat_cnt = 1, t1.platform, 'both') as new_platform
from t2_spending t1
left join
(select spend_date,
user_id,
count(distinct platform) as plat_cnt
from t2_spending
group by spend_date,
user_id) t2
on t1.spend_date = t2.spend_date
and t1.user_id = t2.user_id) tt
group by spend_date,
new_platform) ttt2
on ttt1.platform = ttt2.new_platform
and ttt1.spend_date = ttt2.spend_date
order by 1 asc
SQL结果
代码语言:javascript复制 ------------- ----------- --------------- --------------
| spend_date | platform | total_amount | total_users |
------------- ----------- --------------- --------------
| 2019-01-01 | desktop | 100 | 1 |
| 2019-01-01 | mobile | 100 | 1 |
| 2019-01-01 | both | 200 | 1 |
| 2019-01-02 | desktop | 100 | 1 |
| 2019-01-02 | mobile | 100 | 1 |
| 2019-01-02 | both | 0 | 0 |
------------- ----------- --------------- --------------
四、建表语句和数据插入
代码语言:javascript复制--建表语句
CREATE TABLE t2_spending(
user_id int,
spend_date string,
platform string,
amount int
) COMMENT '支出表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'
;
-- 插入数据
insert into t2_spending(user_id,spend_date,platform,amount)
values
(1,'2019-01-01','mobile',100),
(1,'2019-01-01','desktop',100),
(2,'2019-01-01','mobile',100),
(2,'2019-01-02','mobile',100),
(3,'2019-01-01','desktop',100),
(3,'2019-01-02','desktop',100);