一、题目
现有用户登录日志表 t_login_log,包含用户ID(user_id),登录日期(login_date)。数据已经按照用户日期去重,请查出连续登录超过4天的用户ID
样例数据
代码语言:javascript复制 ---------- -------------
| user_id | login_date |
---------- -------------
| 0001 | 20220101 |
| 0001 | 20220102 |
| 0001 | 20220103 |
| 0001 | 20220104 |
| 0001 | 20220105 |
| 0001 | 20220107 |
| 0001 | 20220108 |
| 0001 | 20220109 |
| 0002 | 20220101 |
| 0002 | 20220102 |
| 0002 | 20220103 |
| 0002 | 20220107 |
| 0002 | 20220108 |
| 0003 | 20220107 |
| 0003 | 20220108 |
| 0003 | 20220109 |
---------- -------------
二、分析
row_number()函数、datediff()函数、日期格式处理 对于连续登录问题处理逻辑,对于同一个用户,如果连续登录,row_number排序和 登录日期与月初的日期差是一个恒定值,如果存在非连续,则值不同。
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️⭐️ |
三、SQL
1.处理日期格式,计算登录日期与月初日期('2022-01-01')差date_diff
执行SQL
代码语言:javascript复制select user_id,
login_date,
datediff(from_unixtime(unix_timestamp(login_date, 'yyyyMMdd'), 'yyyy-MM-dd'), '2022-01-01') as date_diff
from t_login_log;
执行结果
代码语言:javascript复制 ---------- ------------- ------------
| user_id | login_date | date_diff |
---------- ------------- ------------
| 0001 | 20220101 | 0 |
| 0001 | 20220102 | 1 |
| 0001 | 20220103 | 2 |
| 0001 | 20220104 | 3 |
| 0001 | 20220105 | 4 |
| 0001 | 20220107 | 6 |
| 0001 | 20220108 | 7 |
| 0001 | 20220109 | 8 |
| 0002 | 20220101 | 0 |
| 0002 | 20220102 | 1 |
| 0002 | 20220103 | 2 |
| 0002 | 20220107 | 6 |
| 0002 | 20220108 | 7 |
| 0003 | 20220107 | 6 |
| 0003 | 20220108 | 7 |
| 0003 | 20220109 | 8 |
---------- ------------- ------------
2.row_number()开窗,计算每个用户每个登录日期的排序 row_num
执行SQL
代码语言:javascript复制select user_id,
login_date,
datediff(from_unixtime(unix_timestamp(login_date, 'yyyyMMdd'), 'yyyy-MM-dd'), '2022-01-01') as date_diff,
row_number() over (partition by user_id order by login_date asc) as row_num
from t_login_log;
执行结果
代码语言:javascript复制 ---------- ------------- ------------ ----------
| user_id | login_date | date_diff | row_num |
---------- ------------- ------------ ----------
| 0001 | 20220101 | 0 | 1 |
| 0001 | 20220102 | 1 | 2 |
| 0001 | 20220103 | 2 | 3 |
| 0001 | 20220104 | 3 | 4 |
| 0001 | 20220105 | 4 | 5 |
| 0001 | 20220107 | 6 | 6 |
| 0001 | 20220108 | 7 | 7 |
| 0001 | 20220109 | 8 | 8 |
| 0002 | 20220101 | 0 | 1 |
| 0002 | 20220102 | 1 | 2 |
| 0002 | 20220103 | 2 | 3 |
| 0002 | 20220107 | 6 | 4 |
| 0002 | 20220108 | 7 | 5 |
| 0003 | 20220107 | 6 | 1 |
| 0003 | 20220108 | 7 | 2 |
| 0003 | 20220109 | 8 | 3 |
---------- ------------- ------------ ----------
3.计算date_diff和row_num的差值
执行SQL
代码语言:javascript复制select user_id,
login_date,
datediff(from_unixtime(unix_timestamp(login_date, 'yyyyMMdd'), 'yyyy-MM-dd'), '2022-01-01') as date_diff,
row_number() over (partition by user_id order by login_date asc) as row_num,
row_number() over (partition by user_id order by login_date asc) -
datediff(from_unixtime(unix_timestamp(login_date, 'yyyyMMdd'), 'yyyy-MM-dd'), '2022-01-01') as diff
from t_login_log;
查询结果
代码语言:javascript复制 ---------- ------------- ------------ ---------- -------
| user_id | login_date | date_diff | row_num | diff |
---------- ------------- ------------ ---------- -------
| 0001 | 20220101 | 0 | 1 | 1 |
| 0001 | 20220102 | 1 | 2 | 1 |
| 0001 | 20220103 | 2 | 3 | 1 |
| 0001 | 20220104 | 3 | 4 | 1 |
| 0001 | 20220105 | 4 | 5 | 1 |
| 0001 | 20220107 | 6 | 6 | 0 |
| 0001 | 20220108 | 7 | 7 | 0 |
| 0001 | 20220109 | 8 | 8 | 0 |
| 0002 | 20220101 | 0 | 1 | 1 |
| 0002 | 20220102 | 1 | 2 | 1 |
| 0002 | 20220103 | 2 | 3 | 1 |
| 0002 | 20220107 | 6 | 4 | -2 |
| 0002 | 20220108 | 7 | 5 | -2 |
| 0003 | 20220107 | 6 | 1 | -5 |
| 0003 | 20220108 | 7 | 2 | -5 |
| 0003 | 20220109 | 8 | 3 | -5 |
---------- ------------- ------------ ---------- -------
4.统计连续登录大于等于4天的用户
执行SQL
代码语言:javascript复制select user_id
from (select user_id,
diff,
count(1) as login_days
from (select user_id,
login_date,
row_number() over (partition by user_id order by login_date asc) -
datediff(from_unixtime(unix_timestamp(login_date, 'yyyyMMdd'), 'yyyy-MM-dd'), '2022-01-01') as diff
from t_login_log) t
group by user_id, diff) tt
where login_days >= 4
group by user_id
查询结果
代码语言:javascript复制 ----------
| user_id |
----------
| 0001 |
----------
四、建表语句和数据插入
代码语言:javascript复制-- 建表语句
create table t_login_log
(
user_id string comment '用户ID',
login_date string comment '登录日期'
);
--数据插入语句
insert into t_login_log values
('0001','20220101'),
('0001','20220102'),
('0001','20220103'),
('0001','20220104'),
('0001','20220105'),
('0001','20220107'),
('0001','20220108'),
('0001','20220109'),
('0002','20220101'),
('0002','20220102'),
('0002','20220103'),
('0002','20220107'),
('0002','20220108'),
('0003','20220107'),
('0003','20220108'),
('0003','20220109');