一、题目
有如下数据记录直播平台主播上播及下播时间,根据该数据计算出平台最高峰同时直播人数。
代码语言:javascript复制 ---------- ---------------------- ----------------------
| user_id | start_time | end_time |
---------- ---------------------- ----------------------
| 1 | 2024-04-29 01:00:00 | 2024-04-29 02:01:05 |
| 2 | 2024-04-29 01:05:00 | 2024-04-29 02:03:18 |
| 3 | 2024-04-29 02:00:00 | 2024-04-29 04:03:22 |
| 4 | 2024-04-29 03:15:07 | 2024-04-29 04:33:21 |
| 5 | 2024-04-29 03:34:16 | 2024-04-29 06:10:45 |
| 6 | 2024-04-29 05:22:00 | 2024-04-29 07:01:08 |
| 7 | 2024-04-29 06:11:03 | 2024-04-29 09:26:05 |
| 3 | 2024-04-29 08:00:00 | 2024-04-29 12:34:27 |
| 1 | 2024-04-29 11:00:00 | 2024-04-29 16:03:18 |
| 8 | 2024-04-29 15:00:00 | 2024-04-29 17:01:05 |
---------- ---------------------- ----------------------
二、分析
查询同时最大人数,考察的是对拉链转化为日志的处理方式以及聚合函数的累积计算。
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️⭐️ |
三、SQL
平台最高峰同时直播人数
1.首先对原始数据进行处理,生成主播上下播的日志数据,同时增加人数变化字段,主播上播为1,主播下播-1。新数据包含 user_id,action_time,change_cnt
代码语言:javascript复制--开播记录
select
user_id,
start_time as action_time,
1 as change_cnt
from t_livestream_log
union all
--下播记录
select
user_id,
end_time as action_time,
-1 as change_cnt
from t_livestream_log
查询结果
代码语言:javascript复制 -------------- ---------------------- -----------------
| _u1.user_id | _u1.action_time | _u1.change_cnt |
-------------- ---------------------- -----------------
| 1 | 2024-04-29 01:00:00 | 1 |
| 1 | 2024-04-29 02:01:05 | -1 |
| 2 | 2024-04-29 01:05:00 | 1 |
| 2 | 2024-04-29 02:03:18 | -1 |
| 3 | 2024-04-29 02:00:00 | 1 |
| 3 | 2024-04-29 04:03:22 | -1 |
| 4 | 2024-04-29 03:15:07 | 1 |
| 4 | 2024-04-29 04:33:21 | -1 |
| 5 | 2024-04-29 03:34:16 | 1 |
| 5 | 2024-04-29 06:10:45 | -1 |
| 6 | 2024-04-29 05:22:00 | 1 |
| 6 | 2024-04-29 07:01:08 | -1 |
| 7 | 2024-04-29 06:11:03 | 1 |
| 7 | 2024-04-29 09:26:05 | -1 |
| 3 | 2024-04-29 08:00:00 | 1 |
| 3 | 2024-04-29 12:34:27 | -1 |
| 1 | 2024-04-29 11:00:00 | 1 |
| 1 | 2024-04-29 16:03:18 | -1 |
| 8 | 2024-04-29 15:00:00 | 1 |
| 8 | 2024-04-29 17:01:05 | -1 |
-------------- ---------------------- -----------------
2.对操作日志按照操作时间进行累积求和
代码语言:javascript复制with t as(
--开播记录
select
user_id,
start_time as action_time,
1 as change_cnt
from t_livestream_log
union all
--下播记录
select
user_id,
end_time as action_time,
-1 as change_cnt
from t_livestream_log
)
select
user_id,
action_time,
change_cnt,
sum(change_cnt)over(order by action_time asc) as online_cnt
from t
查询结果
代码语言:javascript复制 ---------- ---------------------- ------------- -------------
| user_id | action_time | change_cnt | online_cnt |
---------- ---------------------- ------------- -------------
| 1 | 2024-04-29 01:00:00 | 1 | 1 |
| 2 | 2024-04-29 01:05:00 | 1 | 2 |
| 3 | 2024-04-29 02:00:00 | 1 | 3 |
| 1 | 2024-04-29 02:01:05 | -1 | 2 |
| 2 | 2024-04-29 02:03:18 | -1 | 1 |
| 4 | 2024-04-29 03:15:07 | 1 | 2 |
| 5 | 2024-04-29 03:34:16 | 1 | 3 |
| 3 | 2024-04-29 04:03:22 | -1 | 2 |
| 4 | 2024-04-29 04:33:21 | -1 | 1 |
| 6 | 2024-04-29 05:22:00 | 1 | 2 |
| 5 | 2024-04-29 06:10:45 | -1 | 1 |
| 7 | 2024-04-29 06:11:03 | 1 | 2 |
| 6 | 2024-04-29 07:01:08 | -1 | 1 |
| 3 | 2024-04-29 08:00:00 | 1 | 2 |
| 7 | 2024-04-29 09:26:05 | -1 | 1 |
| 1 | 2024-04-29 11:00:00 | 1 | 2 |
| 3 | 2024-04-29 12:34:27 | -1 | 1 |
| 8 | 2024-04-29 15:00:00 | 1 | 2 |
| 1 | 2024-04-29 16:03:18 | -1 | 1 |
| 8 | 2024-04-29 17:01:05 | -1 | 0 |
---------- ---------------------- ------------- -------------
3.求取累计求和中的最大值,即为当天最高峰同时直播人数 查询语句
代码语言:javascript复制with t as(
--开播记录
select
user_id,
start_time as action_time,
1 as change_cnt
from t_livestream_log
union all
--下播记录
select
user_id,
end_time as action_time,
-1 as change_cnt
from t_livestream_log
)
select
max(online_cnt) as max_online_cnt
from
(
select
user_id,
action_time,
change_cnt,
sum(change_cnt)over(order by action_time asc) as online_cnt
from t
) tt
查询结果
代码语言:javascript复制 -----------------
| max_online_cnt |
-----------------
| 3 |
-----------------
四、建表语句和数据插入
代码语言:javascript复制CREATE TABLE IF NOT EXISTS t_livestream_log (
user_id INT, -- 主播ID
start_time STRING, -- 开始时间
end_time STRING -- 结束时间
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' -- 假设字段使用逗号分隔
STORED AS ORC;
insert into t_livestream_log(user_id, start_time, end_time) values
(1,'2024-04-29 01:00:00','2024-04-29 02:01:05'),
(2,'2024-04-29 01:05:00','2024-04-29 02:03:18'),
(3,'2024-04-29 02:00:00','2024-04-29 04:03:22'),
(4,'2024-04-29 03:15:07','2024-04-29 04:33:21'),
(5,'2024-04-29 03:34:16','2024-04-29 06:10:45'),
(6,'2024-04-29 05:22:00','2024-04-29 07:01:08'),
(7,'2024-04-29 06:11:03','2024-04-29 09:26:05'),
(3,'2024-04-29 08:00:00','2024-04-29 12:34:27'),
(1,'2024-04-29 11:00:00','2024-04-29 16:03:18'),
(8,'2024-04-29 15:00:00','2024-04-29 17:01:05');