美团字节大数据面试SQL-每分钟最大直播人数

2024-05-18 09:18:53 浏览数 (1)

一、题目

有如下数据记录直播平台主播上播及下播时间,根据该数据计算出平台每分钟最大直播人数。

代码语言:javascript复制
 ---------- ---------------------- ---------------------- 
| user_id  |      start_time      |       end_time       |
 ---------- ---------------------- ---------------------- 
| 1        | 2024-04-29 01:00:00  | 2024-04-29 02:01:05  |
| 2        | 2024-04-29 01:05:00  | 2024-04-29 02:03:18  |
| 3        | 2024-04-29 02:00:00  | 2024-04-29 04:03:22  |
| 4        | 2024-04-29 03:15:07  | 2024-04-29 04:33:21  |
| 5        | 2024-04-29 03:34:16  | 2024-04-29 06:10:45  |
| 6        | 2024-04-29 05:22:00  | 2024-04-29 07:01:08  |
| 7        | 2024-04-29 06:11:03  | 2024-04-29 09:26:05  |
| 3        | 2024-04-29 08:00:00  | 2024-04-29 12:34:27  |
| 1        | 2024-04-29 11:00:00  | 2024-04-29 16:03:18  |
| 8        | 2024-04-29 15:00:00  | 2024-04-29 17:01:05  |
 ---------- ---------------------- ---------------------- 

二、分析

查询每分钟最大在线人数,这里处理逻辑基本和最高峰在线人数是一致的,但有一个问题,如果某一分钟内无任何操作记录,则不会出现该分钟的数据,我们就统计不到。所以这里额外考察了生成数据;

维度

评分

题目难度

⭐️⭐️⭐️⭐️

题目清晰度

⭐️⭐️⭐️⭐️⭐️

业务常见度

⭐️⭐️⭐️⭐️⭐️

三、SQL

每分钟最大直播人数

1.首先对原始数据进行处理,生成主播上下播的日志数据,同时增加人数变化字段,主播上播为1,主播下播-1。新数据包含 user_id,action_time,change_cnt

代码语言:javascript复制
--开播记录
select
user_id,
start_time as action_time,
1 as change_cnt
from t_livestream_log
union all 
--下播记录
select
user_id,
end_time as action_time,
-1 as change_cnt
from t_livestream_log

查询结果

代码语言:javascript复制
 -------------- ---------------------- ----------------- 
| _u1.user_id  |   _u1.action_time    | _u1.change_cnt  |
 -------------- ---------------------- ----------------- 
| 1            | 2024-04-29 01:00:00  | 1               |
| 1            | 2024-04-29 02:01:05  | -1              |
| 2            | 2024-04-29 01:05:00  | 1               |
| 2            | 2024-04-29 02:03:18  | -1              |
| 3            | 2024-04-29 02:00:00  | 1               |
| 3            | 2024-04-29 04:03:22  | -1              |
| 4            | 2024-04-29 03:15:07  | 1               |
| 4            | 2024-04-29 04:33:21  | -1              |
| 5            | 2024-04-29 03:34:16  | 1               |
| 5            | 2024-04-29 06:10:45  | -1              |
| 6            | 2024-04-29 05:22:00  | 1               |
| 6            | 2024-04-29 07:01:08  | -1              |
| 7            | 2024-04-29 06:11:03  | 1               |
| 7            | 2024-04-29 09:26:05  | -1              |
| 3            | 2024-04-29 08:00:00  | 1               |
| 3            | 2024-04-29 12:34:27  | -1              |
| 1            | 2024-04-29 11:00:00  | 1               |
| 1            | 2024-04-29 16:03:18  | -1              |
| 8            | 2024-04-29 15:00:00  | 1               |
| 8            | 2024-04-29 17:01:05  | -1              |
 -------------- ---------------------- ----------------- 

2.生成0~24*60-1条记录数据

代码语言:javascript复制
select idx from (select posexplode(split(space(24*60),' ')) as (idx,value)) t

查询结果,这里仅显示前10行数据

代码语言:javascript复制
 ------ 
| idx  |
 ------ 
| 0    |
| 1    |
| 2    |
| 3    |
| 4    |
| 5    |
| 6    |
| 7    |
| 8    |
| 9    |
 ------ 

3.根据2生成每分钟一条记录的心跳记录,心跳记录change_cnt= 0 ,代表没有主播上播,也没有主播下播。

代码语言:javascript复制
SELECT
    0 as user_id,
    from_unixtime(unix_timestamp('2024-04-29','yyyy-MM-dd') item*60,'yyyy-MM-dd HH:mm:ss') as action_time,
    0 as change_cnt
from (select posexplode(split(space(24*60),' ')) as (item,value)) t
limit 10;

查询结果,这里仅显示前10行数据

代码语言:javascript复制
 ---------- ---------------------- ------------- 
| user_id  |     action_time      | change_cnt  |
 ---------- ---------------------- ------------- 
| 0        | 2024-04-29 00:00:00  | 0           |
| 0        | 2024-04-29 00:01:00  | 0           |
| 0        | 2024-04-29 00:02:00  | 0           |
| 0        | 2024-04-29 00:03:00  | 0           |
| 0        | 2024-04-29 00:04:00  | 0           |
| 0        | 2024-04-29 00:05:00  | 0           |
| 0        | 2024-04-29 00:06:00  | 0           |
| 0        | 2024-04-29 00:07:00  | 0           |
| 0        | 2024-04-29 00:08:00  | 0           |
| 0        | 2024-04-29 00:09:00  | 0           |
 ---------- ---------------------- ------------- 

4.汇总所有数据之后,对change_cnt累积求和,然后求出每分钟的最大值即可

代码语言:javascript复制
with t_all as(
--开播记录
select
    user_id,
    start_time as action_time,
    1 as change_cnt
from t_livestream_log
union all
--下播记录
select
    user_id,
    end_time as action_time,
    -1 as change_cnt
from t_livestream_log
--心跳数据
union all
SELECT
    0 as user_id,
    from_unixtime(unix_timestamp('2024-04-29','yyyy-MM-dd') item*60,'yyyy-MM-dd HH:mm:ss') as action_time,
    0 as change_cnt
from (select posexplode(split(space(24*60),' ')) as (item,value)) t
)
select
    date_format(action_time,'yyyy-MM-dd HH:mm') as act_minute,
    max(online_cnt) as minute_max_cnt
from (select user_id,
             action_time,
             change_cnt,
             sum(change_cnt) over (order by action_time asc) online_cnt
      from t_all
      )t1
group by date_format(action_time,'yyyy-MM-dd HH:mm')
;

查询结果,截取了部分有直播数据进行展示

代码语言:javascript复制
|    act_minute     | minute_max_cnt  |
 ------------------- ----------------- 
| 2024-04-29 16:40  | 1               |
| 2024-04-29 16:41  | 1               |
| 2024-04-29 16:42  | 1               |
| 2024-04-29 16:43  | 1               |
| 2024-04-29 16:44  | 1               |
| 2024-04-29 16:45  | 1               |
| 2024-04-29 16:46  | 1               |
| 2024-04-29 16:47  | 1               |
| 2024-04-29 16:48  | 1               |
| 2024-04-29 16:49  | 1               |
| 2024-04-29 16:50  | 1               |
| 2024-04-29 16:51  | 1               |
| 2024-04-29 16:52  | 1               |
| 2024-04-29 16:53  | 1               |
| 2024-04-29 16:54  | 1               |
| 2024-04-29 16:55  | 1               |
| 2024-04-29 16:56  | 1               |
| 2024-04-29 16:57  | 1               |
| 2024-04-29 16:58  | 1               |

四、建表语句和数据插入

代码语言:javascript复制
CREATE TABLE IF NOT EXISTS t_livestream_log (
    user_id INT, -- 主播ID
    start_time STRING, -- 开始时间
    end_time STRING -- 结束时间
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','  -- 字段使用逗号分隔
STORED AS ORC;

insert into t_livestream_log(user_id, start_time, end_time) values 
(1,'2024-04-29 01:00:00','2024-04-29 02:01:05'),
(2,'2024-04-29 01:05:00','2024-04-29 02:03:18'),
(3,'2024-04-29 02:00:00','2024-04-29 04:03:22'),
(4,'2024-04-29 03:15:07','2024-04-29 04:33:21'),
(5,'2024-04-29 03:34:16','2024-04-29 06:10:45'),
(6,'2024-04-29 05:22:00','2024-04-29 07:01:08'),
(7,'2024-04-29 06:11:03','2024-04-29 09:26:05'),
(3,'2024-04-29 08:00:00','2024-04-29 12:34:27'),
(1,'2024-04-29 11:00:00','2024-04-29 16:03:18'),
(8,'2024-04-29 15:00:00','2024-04-29 17:01:05');

0 人点赞