一、题目
有某城市网吧上网记录表,包含字段:网吧id,访客id(身份证号),上线时间,下线时间。
- 规则1:如果两个用户在同一个网吧上线时间或者下线时间间隔在10分钟以内,则两个用户可能认识;
- 规则2:如果两个用户在三家以上的网吧出现过【规则1】可能认识的情况,则两人一定认识;
请计算该市中两人一定认识的组合数
代码语言:javascript复制 --------- ---------- ---------------------- ----------------------
| bar_id | user_id | login_time | logoff_time |
--------- ---------- ---------------------- ----------------------
| 1 | 001 | 2023-08-01 09:00:00 | 2023-08-01 10:00:00 |
| 1 | 003 | 2023-08-01 09:04:00 | 2023-08-01 11:00:00 |
| 2 | 004 | 2023-08-01 10:00:00 | 2023-08-01 12:02:00 |
| 1 | 006 | 2023-08-01 10:00:00 | 2023-08-01 12:00:00 |
| 2 | 005 | 2023-08-01 10:10:00 | 2023-08-01 11:00:00 |
| 2 | 001 | 2023-08-01 11:01:00 | 2023-08-01 12:00:00 |
| 2 | 002 | 2023-08-01 11:03:00 | 2023-08-01 14:00:00 |
| 3 | 002 | 2023-08-02 15:00:00 | 2023-08-02 17:06:00 |
| 3 | 001 | 2023-08-02 16:01:00 | 2023-08-02 17:07:00 |
| 3 | 004 | 2023-08-02 16:02:00 | 2023-08-02 18:00:00 |
| 3 | 003 | 2023-08-02 20:00:00 | 2023-08-02 22:00:00 |
| 4 | 001 | 2023-08-03 17:00:00 | 2023-08-03 19:00:00 |
| 4 | 002 | 2023-08-03 18:00:00 | 2023-08-03 21:00:00 |
| 4 | 003 | 2023-08-03 18:05:00 | 2023-08-03 22:00:00 |
| 4 | 004 | 2023-08-03 19:00:00 | 2023-08-03 18:58:00 |
--------- ---------- ---------------------- ----------------------
二、分析
- 首先计算可能认识的人,由于所有可能认识的条件必须发生在同一个网吧内,以bar_id进行自关联,然后id要求t1>t2来保证同一个用户和其他的用户只进行一次关联,限定上线时间或者下线时间在10分钟内;
- 计算出可能认识的用户组中,出现的网吧个数;
- 计算网吧个数>=3的组合数;
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️ |
三、SQL
1.可能认识的记录
查询SQL
代码语言:javascript复制select t1.bar_id,
t1.user_id,
t1.login_time,
t1.logoff_time,
t2.bar_id,
t2.user_id,
t2.login_time,
t2.logoff_time
from (select bar_id, user_id, login_time, logoff_time
from t_netbar) t1
join
(select bar_id, user_id, login_time, logoff_time
from t_netbar) t2
on t1.bar_id = t2.bar_id
where t1.user_id < t2.user_id --限定t1的user_id <t2的user_id,防止重复关联和自关联
and (
abs(unix_timestamp(t1.login_time, 'yyyy-MM-dd HH:mm:ss') - unix_timestamp(t2.login_time, 'yyyy-MM-dd HH:mm:ss')) <
600-- 上线时间在10min内
or
abs(unix_timestamp(t1.logoff_time, 'yyyy-MM-dd HH:mm:ss') - unix_timestamp(t2.logoff_time, 'yyyy-MM-dd HH:mm:ss')) <
600 -- 下线时间在10min内
)
查询结果
代码语言:javascript复制 ------------ ------------- ---------------------- ---------------------- ------------ ------------- ---------------------- ----------------------
| t1.bar_id | t1.user_id | t1.login_time | t1.logoff_time | t2.bar_id | t2.user_id | t2.login_time | t2.logoff_time |
------------ ------------- ---------------------- ---------------------- ------------ ------------- ---------------------- ----------------------
| 2 | 001 | 2023-08-01 11:01:00 | 2023-08-01 12:00:00 | 2 | 002 | 2023-08-01 11:03:00 | 2023-08-01 14:00:00 |
| 3 | 001 | 2023-08-02 16:01:00 | 2023-08-02 17:07:00 | 3 | 002 | 2023-08-02 15:00:00 | 2023-08-02 17:06:00 |
| 1 | 001 | 2023-08-01 09:00:00 | 2023-08-01 10:00:00 | 1 | 003 | 2023-08-01 09:04:00 | 2023-08-01 11:00:00 |
| 4 | 002 | 2023-08-03 18:00:00 | 2023-08-03 21:00:00 | 4 | 003 | 2023-08-03 18:05:00 | 2023-08-03 22:00:00 |
| 2 | 001 | 2023-08-01 11:01:00 | 2023-08-01 12:00:00 | 2 | 004 | 2023-08-01 10:00:00 | 2023-08-01 12:02:00 |
| 3 | 001 | 2023-08-02 16:01:00 | 2023-08-02 17:07:00 | 3 | 004 | 2023-08-02 16:02:00 | 2023-08-02 18:00:00 |
| 4 | 001 | 2023-08-03 17:00:00 | 2023-08-03 19:00:00 | 4 | 004 | 2023-08-03 19:00:00 | 2023-08-03 18:58:00 |
------------ ------------- ---------------------- ---------------------- ------------ ------------- ---------------------- ----------------------
2.按照t1的user_id、t2的user_id 进行分组,统计网吧ID个数
执行SQL
代码语言:javascript复制select user_id1,
user_id2,
count(distinct bar_id) as bar_num
from (select t1.bar_id,
t1.user_id as user_id1,
t2.user_id as user_id2
from (select bar_id, user_id, login_time, logoff_time
from t_netbar) t1
join
(select bar_id, user_id, login_time, logoff_time
from t_netbar) t2
on t1.bar_id = t2.bar_id
where t1.user_id < t2.user_id --限定t1的user_id <t2的user_id,防止重复关联和自关联
and (
abs(unix_timestamp(t1.login_time, 'yyyy-MM-dd HH:mm:ss') -
unix_timestamp(t2.login_time, 'yyyy-MM-dd HH:mm:ss')) < 600-- 上线时间在10min内
or
abs(unix_timestamp(t1.logoff_time, 'yyyy-MM-dd HH:mm:ss') -
unix_timestamp(t2.logoff_time, 'yyyy-MM-dd HH:mm:ss')) < 600 -- 下线时间在10min内
)) t
group by user_id1, user_id2
查询结果
代码语言:javascript复制 ----------- ----------- ----------
| user_id1 | user_id2 | bar_num |
----------- ----------- ----------
| 001 | 002 | 2 |
| 001 | 003 | 1 |
| 001 | 004 | 3 |
| 002 | 003 | 1 |
----------- ----------- ----------
3.统计最终结果
执行SQL
代码语言:javascript复制select count(1) as group_nums
from (select user_id1,
user_id2,
count(distinct bar_id) as bar_num
from (select t1.bar_id,
t1.user_id as user_id1,
t2.user_id as user_id2
from (select bar_id, user_id, login_time, logoff_time
from t_netbar) t1
join
(select bar_id, user_id, login_time, logoff_time
from t_netbar) t2
on t1.bar_id = t2.bar_id
where t1.user_id < t2.user_id --限定t1的user_id <t2的user_id,防止重复关联和自关联
and (
abs(unix_timestamp(t1.login_time, 'yyyy-MM-dd HH:mm:ss') -
unix_timestamp(t2.login_time, 'yyyy-MM-dd HH:mm:ss')) < 600-- 上线时间在10min内
or
abs(unix_timestamp(t1.logoff_time, 'yyyy-MM-dd HH:mm:ss') -
unix_timestamp(t2.logoff_time, 'yyyy-MM-dd HH:mm:ss')) < 600 -- 下线时间在10min内
)) t
group by user_id1, user_id2) tt
where bar_num >= 3
查询结果
代码语言:javascript复制 -------------
| group_nums |
-------------
| 1 |
-------------
四、建表语句和数据插入
代码语言:javascript复制--建表语句
CREATE TABLE t_netbar (
bar_id bigint COMMENT '网吧ID',
user_id string COMMENT '用户ID',
login_time string COMMENT '上线时间',
logoff_time string COMMENT '下线时间'
) COMMENT '网吧上下线记录表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'
;
-- 插入数据
insert into t_netbar(bar_id,user_id,login_time,logoff_time)
values
(1,'001','2023-08-01 09:00:00','2023-08-01 10:00:00'),
(2,'001','2023-08-01 11:01:00','2023-08-01 12:00:00'),
(2,'002','2023-08-01 11:03:00','2023-08-01 14:00:00'),
(3,'001','2023-08-02 16:01:00','2023-08-02 17:07:00'),
(3,'002','2023-08-02 15:00:00','2023-08-02 17:06:00'),
(4,'001','2023-08-03 17:00:00','2023-08-03 19:00:00'),
(4,'002','2023-08-03 18:00:00','2023-08-03 21:00:00'),
(1,'003','2023-08-01 09:04:00','2023-08-01 11:00:00'),
(3,'003','2023-08-02 20:00:00','2023-08-02 22:00:00'),
(4,'003','2023-08-03 18:05:00','2023-08-03 22:00:00'),
(2,'004','2023-08-01 10:00:00','2023-08-01 12:02:00'),
(3,'004','2023-08-02 16:02:00','2023-08-02 18:00:00'),
(4,'004','2023-08-03 19:00:00','2023-08-03 18:58:00'),
(2,'005','2023-08-01 10:10:00','2023-08-01 11:00:00'),
(1,'006','2023-08-01 10:00:00','2023-08-01 12:00:00')