小红书大数据面试SQL-查询每个用户的第一条和最后一条记录

2024-06-27 20:38:39 浏览数 (1)

一、题目

现有一张订单表 t_order 有订单ID、用户ID、商品ID、购买商品数量、购买时间,请查询出每个用户的第一条记录和最后一条记录。样例数据如下:

代码语言:javascript复制
 ----------- ---------- ------------- ----------- ------------------------ 
| order_id  | user_id  | product_id  | quantity  |     purchase_time      |
 ----------- ---------- ------------- ----------- ------------------------ 
| 1         | 1        | 1001        | 1         | 2023-03-13 08:30:00.0  |
| 2         | 1        | 1002        | 1         | 2023-03-13 10:45:00.0  |
| 3         | 1        | 1001        | 1         | 2023-03-13 10:45:01.0  |
| 4         | 2        | 1001        | 3         | 2023-03-13 14:20:00.0  |
| 5         | 3        | 1003        | 1         | 2023-03-13 16:15:00.0  |
| 6         | 3        | 1002        | 1         | 2023-03-13 12:10:00.0  |
| 7         | 3        | 1001        | 1         | 2023-03-13 12:10:01.0  |
| 8         | 4        | 1002        | 2         | 2023-03-13 09:00:00.0  |
| 9         | 4        | 1003        | 1         | 2023-03-13 11:30:00.0  |
| 10        | 4        | 1004        | 3         | 2023-03-13 13:40:00.0  |
| 11        | 4        | 1001        | 1         | 2023-03-13 17:25:00.0  |
| 12        | 4        | 1002        | 2         | 2023-03-13 15:05:00.0  |
| 13        | 4        | 1004        | 1         | 2023-03-13 11:55:00.0  |
 ----------- ---------- ------------- ----------- ------------------------ 

二、分析

取出符合条件的整行记录,使用开窗函数row_number()得到符合条件的行号后,限制行号得到最终结果。这里需要第一条和最后一条,因为无法提前预知每个用户的行数,所以使用两次row_number进行开窗,排序方式根据时间进行正向排序和逆向排序,分别取出行号为1的借口

维度

评分

题目难度

⭐️⭐️⭐️

题目清晰度

⭐️⭐️⭐️⭐️⭐️

业务常见度

⭐️⭐️⭐️⭐️⭐️

三、SQL

1.添加行号

使用row_number()根据用户进行分组,根据时间分别进行正向排序和逆向排序,增加两个行号,分别为asc_rn和desc_rn

执行SQL

代码语言:javascript复制
select order_id,
       user_id,
       product_id,
       quantity,
       purchase_time,
       row_number() over (partition by user_id order by purchase_time asc)  as asc_rn,
       row_number() over (partition by user_id order by purchase_time desc) as desc_rn
from t_order;

执行结果

代码语言:javascript复制
 ----------- ---------- ------------- ----------- ------------------------ --------- ---------- 
| order_id  | user_id  | product_id  | quantity  |     purchase_time      | asc_rn  | desc_rn  |
 ----------- ---------- ------------- ----------- ------------------------ --------- ---------- 
| 3         | 1        | 1001        | 1         | 2023-03-13 10:45:01.0  | 3       | 1        |
| 2         | 1        | 1002        | 1         | 2023-03-13 10:45:00.0  | 2       | 2        |
| 1         | 1        | 1001        | 1         | 2023-03-13 08:30:00.0  | 1       | 3        |
| 4         | 2        | 1001        | 3         | 2023-03-13 14:20:00.0  | 1       | 1        |
| 5         | 3        | 1003        | 1         | 2023-03-13 16:15:00.0  | 3       | 1        |
| 7         | 3        | 1001        | 1         | 2023-03-13 12:10:01.0  | 2       | 2        |
| 6         | 3        | 1002        | 1         | 2023-03-13 12:10:00.0  | 1       | 3        |
| 11        | 4        | 1001        | 1         | 2023-03-13 17:25:00.0  | 6       | 1        |
| 12        | 4        | 1002        | 2         | 2023-03-13 15:05:00.0  | 5       | 2        |
| 10        | 4        | 1004        | 3         | 2023-03-13 13:40:00.0  | 4       | 3        |
| 13        | 4        | 1004        | 1         | 2023-03-13 11:55:00.0  | 3       | 4        |
| 9         | 4        | 1003        | 1         | 2023-03-13 11:30:00.0  | 2       | 5        |
| 8         | 4        | 1002        | 2         | 2023-03-13 09:00:00.0  | 1       | 6        |
 ----------- ---------- ------------- ----------- ------------------------ --------- ---------- 

2.取出第一条和最后一条记录

限制asc_rn=1取第一条,desc_rn=1 取最后一条

执行SQL

代码语言:javascript复制
select order_id,
       user_id,
       product_id,
       quantity,
       purchase_time
from (select order_id,
             user_id,
             product_id,
             quantity,
             purchase_time,
             row_number() over (partition by user_id order by purchase_time asc)  as asc_rn,
             row_number() over (partition by user_id order by purchase_time desc) as desc_rn
      from t_order) t1
where t1.asc_rn = 1
   or t1.desc_rn = 1

执行结果

代码语言:javascript复制
 ----------- ---------- ------------- ----------- ------------------------ 
| order_id  | user_id  | product_id  | quantity  |     purchase_time      |
 ----------- ---------- ------------- ----------- ------------------------ 
| 3         | 1        | 1001        | 1         | 2023-03-13 10:45:01.0  |
| 1         | 1        | 1001        | 1         | 2023-03-13 08:30:00.0  |
| 4         | 2        | 1001        | 3         | 2023-03-13 14:20:00.0  |
| 5         | 3        | 1003        | 1         | 2023-03-13 16:15:00.0  |
| 6         | 3        | 1002        | 1         | 2023-03-13 12:10:00.0  |
| 11        | 4        | 1001        | 1         | 2023-03-13 17:25:00.0  |
| 8         | 4        | 1002        | 2         | 2023-03-13 09:00:00.0  |
 ----------- ---------- ------------- ----------- ------------------------ 

四、数据准备

代码语言:javascript复制
--建表语句
CREATE TABLE t_order (
    order_id INT,
    user_id INT,
    product_id INT,
    quantity INT,
    purchase_time TIMESTAMP
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

--数据插入语句
INSERT INTO t_order VALUES
(1, 1, 1001, 1, '2023-03-13 08:30:00'),
(2, 1, 1002, 1, '2023-03-13 10:45:00'),
(3, 1, 1001, 1, '2023-03-13 10:45:01'),
(4, 2, 1001, 3, '2023-03-13 14:20:00'),
(5, 3, 1003, 1, '2023-03-13 16:15:00'),
(6, 3, 1002, 1, '2023-03-13 12:10:00'),
(7, 3, 1001, 1, '2023-03-13 12:10:01'),
(8, 4, 1002, 2, '2023-03-13 09:00:00'),
(9, 4, 1003, 1, '2023-03-13 11:30:00'),
(10, 4, 1004, 3, '2023-03-13 13:40:00'),
(11, 4, 1001, 1, '2023-03-13 17:25:00'),
(12, 4, 1002, 2, '2023-03-13 15:05:00'),
(13, 4, 1004, 1, '2023-03-13 11:55:00');

0 人点赞