腾讯大数据面试SQL-连续登陆超过N天的用户

2024-06-27 20:33:02 浏览数 (1)

一、题目

现有用户登录日志表 t_login_log,包含用户ID(user_id),登录日期(login_date)。数据已经按照用户日期去重,请查出连续登录超过4天的用户ID

样例数据

代码语言:javascript复制
 ---------- ------------- 
| user_id  | login_date  |
 ---------- ------------- 
| 0001     | 20220101    |
| 0001     | 20220102    |
| 0001     | 20220103    |
| 0001     | 20220104    |
| 0001     | 20220105    |
| 0001     | 20220107    |
| 0001     | 20220108    |
| 0001     | 20220109    |
| 0002     | 20220101    |
| 0002     | 20220102    |
| 0002     | 20220103    |
| 0002     | 20220107    |
| 0002     | 20220108    |
| 0003     | 20220107    |
| 0003     | 20220108    |
| 0003     | 20220109    |
 ---------- ------------- 

二、分析

row_number()函数、datediff()函数、日期格式处理 对于连续登录问题处理逻辑,对于同一个用户,如果连续登录,row_number排序和 登录日期与月初的日期差是一个恒定值,如果存在非连续,则值不同。

维度

评分

题目难度

⭐️⭐️⭐️

题目清晰度

⭐️⭐️⭐️⭐️⭐️

业务常见度

⭐️⭐️⭐️⭐️⭐️

三、SQL

1.处理日期格式,计算登录日期与月初日期('2022-01-01')差date_diff

执行SQL

代码语言:javascript复制
select user_id,
       login_date,
       datediff(from_unixtime(unix_timestamp(login_date, 'yyyyMMdd'), 'yyyy-MM-dd'), '2022-01-01') as date_diff
from t_login_log;

执行结果

代码语言:javascript复制
 ---------- ------------- ------------ 
| user_id  | login_date  | date_diff  |
 ---------- ------------- ------------ 
| 0001     | 20220101    | 0          |
| 0001     | 20220102    | 1          |
| 0001     | 20220103    | 2          |
| 0001     | 20220104    | 3          |
| 0001     | 20220105    | 4          |
| 0001     | 20220107    | 6          |
| 0001     | 20220108    | 7          |
| 0001     | 20220109    | 8          |
| 0002     | 20220101    | 0          |
| 0002     | 20220102    | 1          |
| 0002     | 20220103    | 2          |
| 0002     | 20220107    | 6          |
| 0002     | 20220108    | 7          |
| 0003     | 20220107    | 6          |
| 0003     | 20220108    | 7          |
| 0003     | 20220109    | 8          |
 ---------- ------------- ------------ 

2.row_number()开窗,计算每个用户每个登录日期的排序 row_num

执行SQL

代码语言:javascript复制
select user_id,
       login_date,
       datediff(from_unixtime(unix_timestamp(login_date, 'yyyyMMdd'), 'yyyy-MM-dd'), '2022-01-01') as date_diff,
       row_number() over (partition by user_id order by login_date asc)                            as row_num
from t_login_log;

执行结果

代码语言:javascript复制
 ---------- ------------- ------------ ---------- 
| user_id  | login_date  | date_diff  | row_num  |
 ---------- ------------- ------------ ---------- 
| 0001     | 20220101    | 0          | 1        |
| 0001     | 20220102    | 1          | 2        |
| 0001     | 20220103    | 2          | 3        |
| 0001     | 20220104    | 3          | 4        |
| 0001     | 20220105    | 4          | 5        |
| 0001     | 20220107    | 6          | 6        |
| 0001     | 20220108    | 7          | 7        |
| 0001     | 20220109    | 8          | 8        |
| 0002     | 20220101    | 0          | 1        |
| 0002     | 20220102    | 1          | 2        |
| 0002     | 20220103    | 2          | 3        |
| 0002     | 20220107    | 6          | 4        |
| 0002     | 20220108    | 7          | 5        |
| 0003     | 20220107    | 6          | 1        |
| 0003     | 20220108    | 7          | 2        |
| 0003     | 20220109    | 8          | 3        |
 ---------- ------------- ------------ ---------- 

3.计算date_diff和row_num的差值

执行SQL

代码语言:javascript复制
select user_id,
       login_date,
       datediff(from_unixtime(unix_timestamp(login_date, 'yyyyMMdd'), 'yyyy-MM-dd'), '2022-01-01') as date_diff,
       row_number() over (partition by user_id order by login_date asc)                            as row_num,
       row_number() over (partition by user_id order by login_date asc) -
       datediff(from_unixtime(unix_timestamp(login_date, 'yyyyMMdd'), 'yyyy-MM-dd'), '2022-01-01') as diff
from t_login_log;

查询结果

代码语言:javascript复制
 ---------- ------------- ------------ ---------- ------- 
| user_id  | login_date  | date_diff  | row_num  | diff  |
 ---------- ------------- ------------ ---------- ------- 
| 0001     | 20220101    | 0          | 1        | 1     |
| 0001     | 20220102    | 1          | 2        | 1     |
| 0001     | 20220103    | 2          | 3        | 1     |
| 0001     | 20220104    | 3          | 4        | 1     |
| 0001     | 20220105    | 4          | 5        | 1     |
| 0001     | 20220107    | 6          | 6        | 0     |
| 0001     | 20220108    | 7          | 7        | 0     |
| 0001     | 20220109    | 8          | 8        | 0     |
| 0002     | 20220101    | 0          | 1        | 1     |
| 0002     | 20220102    | 1          | 2        | 1     |
| 0002     | 20220103    | 2          | 3        | 1     |
| 0002     | 20220107    | 6          | 4        | -2    |
| 0002     | 20220108    | 7          | 5        | -2    |
| 0003     | 20220107    | 6          | 1        | -5    |
| 0003     | 20220108    | 7          | 2        | -5    |
| 0003     | 20220109    | 8          | 3        | -5    |
 ---------- ------------- ------------ ---------- ------- 

4.统计连续登录大于等于4天的用户

执行SQL

代码语言:javascript复制
select user_id
from (select user_id,
             diff,
             count(1) as login_days
      from (select user_id,
                   login_date,
                   row_number() over (partition by user_id order by login_date asc) -
                   datediff(from_unixtime(unix_timestamp(login_date, 'yyyyMMdd'), 'yyyy-MM-dd'), '2022-01-01') as diff
            from t_login_log) t
      group by user_id, diff) tt
where login_days >= 4
group by user_id

查询结果

代码语言:javascript复制
 ---------- 
| user_id  |
 ---------- 
| 0001     |
 ---------- 

四、建表语句和数据插入

代码语言:javascript复制
-- 建表语句
create table t_login_log
(
user_id string comment '用户ID',
login_date string comment '登录日期'
);

--数据插入语句
insert into t_login_log values
('0001','20220101'),
('0001','20220102'),
('0001','20220103'),
('0001','20220104'),
('0001','20220105'),
('0001','20220107'),
('0001','20220108'),
('0001','20220109'),
('0002','20220101'),
('0002','20220102'),
('0002','20220103'),
('0002','20220107'),
('0002','20220108'),
('0003','20220107'),
('0003','20220108'),
('0003','20220109');

0 人点赞