常见大数据面试SQL-分组连续累积计数

2024-09-10 21:04:33 浏览数 (2)

一、题目

有temp表包含A,B两列,请使用SQL对该B列进行处理,形成C列,按A列顺序,B列值不变,则C列累计计数,C列值变化,则C列重新开始计数,期望结果如下

样例数据

代码语言:javascript复制
 ------- ---- 
|   a   | b  |
 ------- ---- 
| 2010  | 1  |
| 2011  | 1  |
| 2012  | 1  |
| 2013  | 0  |
| 2014  | 0  |
| 2015  | 1  |
| 2016  | 1  |
| 2017  | 1  |
| 2018  | 0  |
| 2019  | 0  |
 ------- ---- 

期望结果

代码语言:javascript复制
 ------- ---- ---- 
|   a   | b  | c  |
 ------- ---- ---- 
| 2010  | 1  | 1  |
| 2011  | 1  | 2  |
| 2012  | 1  | 3  |
| 2013  | 0  | 1  |
| 2014  | 0  | 2  |
| 2015  | 1  | 1  |
| 2016  | 1  | 2  |
| 2017  | 1  | 3  |
| 2018  | 0  | 1  |
| 2019  | 0  | 2  |
 ------- ---- ---- 

二、分析

  • 本题是连续问题的变种,在判断连续的基础上进行累积计数。连续问题解决方案参考一文搞懂连续问题
  • 本题考点相对较多,连续问题本身已经较难,会涉及到lag函数、sum()over(order by) 进行累积求和、连续数据处理的技巧,本题在连续的基础上又考察了count(*)over(order by )的累积计数。

维度

评分

题目难度

⭐️⭐️⭐️⭐️

题目清晰度

⭐️⭐️⭐️⭐️⭐️

业务常见度

⭐️⭐️⭐️

三、SQL

1.lag()判断是否连续

使用lag判断是否连续,并对连续打标为0,不连续打标为1

执行SQL

代码语言:javascript复制
select a,
       b,
       if(b = lag(b,1) over (order by a asc), 0, 1) as is_conn
from t19_temp

SQL结果

代码语言:javascript复制
 ------- ---- ---------- 
|   a   | b  | is_conn  |
 ------- ---- ---------- 
| 2010  | 1  | 1        |
| 2011  | 1  | 0        |
| 2012  | 1  | 0        |
| 2013  | 0  | 1        |
| 2014  | 0  | 0        |
| 2015  | 1  | 1        |
| 2016  | 1  | 0        |
| 2017  | 1  | 0        |
| 2018  | 0  | 1        |
| 2019  | 0  | 0        |
 ------- ---- ---------- 

2.计算分组id

使sum()over(order by )方式计算出连续的分组id:conn_group_id

执行SQL

代码语言:javascript复制
select a,
       b, 
       sum(is_conn) over (order by a asc) as conn_group_id
from (select a, 
             b,
             if(b = lag(b, 1) over (order by a asc), 0, 1) as is_conn
      from t19_temp
      ) t

SQL结果

代码语言:javascript复制
 ------- ---- ---------------- 
|   a   | b  | conn_group_id  |
 ------- ---- ---------------- 
| 2010  | 1  | 1              |
| 2011  | 1  | 1              |
| 2012  | 1  | 1              |
| 2013  | 0  | 2              |
| 2014  | 0  | 2              |
| 2015  | 1  | 3              |
| 2016  | 1  | 3              |
| 2017  | 1  | 3              |
| 2018  | 0  | 4              |
| 2019  | 0  | 4              |
 ------- ---- ---------------- 

3.按照分组id分组,count(*)over(order by) 计数,得出最后结果

执行SQL

代码语言:javascript复制
select a,
       b,
       count(1) over (partition by conn_group_id order by a asc) as c
from (select a,
             b,
             sum(is_conn) over (order by a asc) as conn_group_id
      from (select
                a,
                b,
                if(b = lag(b, 1) over (order by a asc), 0, 1) as is_conn
            from db_interview_cj.t19_temp
            ) t
      ) tt

SQL结果

代码语言:javascript复制
 ------- ---- ---- 
|   a   | b  | c  |
 ------- ---- ---- 
| 2010  | 1  | 1  |
| 2011  | 1  | 2  |
| 2012  | 1  | 3  |
| 2013  | 0  | 1  |
| 2014  | 0  | 2  |
| 2015  | 1  | 1  |
| 2016  | 1  | 2  |
| 2017  | 1  | 3  |
| 2018  | 0  | 1  |
| 2019  | 0  | 2  |
 ------- ---- ---- 

四、建表语句和数据插入

代码语言:javascript复制
--建表语句
CREATE TABLE IF NOT EXISTS t19_temp
(
    a    string, -- 用户id
    b    bigint -- 登陆日期
)
    COMMENT 'temp';
--插入数据
INSERT INTO t19_temp VALUES
('2010',1),
('2011',1),
('2012',1),
('2013',0),
('2014',0),
('2015',1),
('2016',1),
('2017',1),
('2018',0),
('2019',0);

0 人点赞