Doris之ROLLUP

2021-09-08 15:55:27 浏览数 (1)

1、基本概念

  • Base表(基表):也就是通过create table命令创建出来的表,包括明细表(DUPLICATE )、聚合表(Aggregate )、更新表(UNIQUE )
  • 物化视图(Materialized Views):简称 MVs,将预先计算(根据定义好的 SELECT 语句)好的数据集,存储在 Doris 中的一个特殊的表。
  • ROLLUP是物化视图的一个子集。ALTER TABLE ADD ROLLUP 语法功能可以通过 CREATE MATERIALIZED VIEW 语法实现。
  • ROLLUP本意是上卷,属于多维分析中的概念,将数据按照某种指定粒度进行聚合。
  • 在 Base 表之上,可以创建多个 ROLLUP表,在物理上是独立存储。
  • Base 表中的更新表属于特殊的聚合表,可以建立ROLLUP; Duplicate表不能聚合,建立ROLLUP仅仅是作为调整列顺序,以命中前缀索引的作用。

2、创建ROLLUP

(1)准备数据

代码语言:javascript复制
MySQL [test]> insert into test.user_visit values("2021-09-01",1,"2021-08-31 11:30:22",12,1,2);
Query OK, 1 row affected (0.14 sec)
{'label':'insert_842e8f6967df4032-925ba0c7db659b4c', 'status':'VISIBLE', 'txnId':'240'}

MySQL [test]> insert into test.user_visit values("2021-09-01",2,"2021-08-31 10:31:32",9,3,3);
Query OK, 1 row affected (0.05 sec)
{'label':'insert_d671f0dc68144843-bb6f94cc499c1b0e', 'status':'VISIBLE', 'txnId':'241'}

MySQL [test]> insert into test.user_visit values("2021-08-31",2,"2021-08-30 11:32:32",11,2,3);
Query OK, 1 row affected (0.06 sec)
{'label':'insert_d2937852f1804868-8ba323356c9f772c', 'status':'VISIBLE', 'txnId':'242'}

MySQL [test]> select * from test.user_visit;
 ------------ --------- --------------------- ---------------- ---------------- ------ 
| dt         | user_id | last_visit          | max_dwell_time | min_dwell_time | pv   |
 ------------ --------- --------------------- ---------------- ---------------- ------ 
| 2021-08-31 |       1 | 2021-08-30 11:30:22 |             10 |              0 |    5 |
| 2021-09-01 |       1 | 2021-08-31 11:30:22 |             12 |              1 |    2 |
| 2021-08-31 |       2 | 2021-08-30 11:32:32 |             11 |              2 |    3 |
| 2021-09-01 |       2 | 2021-08-31 10:31:32 |              9 |              3 |    3 |
 ------------ --------- --------------------- ---------------- ---------------- ------ 
4 rows in set (0.02 sec)

MySQL [test]> 

(2)创建ROLLUP

代码语言:javascript复制
MySQL [test]> alter table test.user_visit add rollup rollup_userid(user_id,pv);
Query OK, 0 rows affected (0.06 sec)

MySQL [test]> 

说明:rollup_userid是我们创建的ROLLUP名称

(3)查看查询计划

代码语言:javascript复制
MySQL [test]> select user_id,sum(pv) from test.user_visit group by user_id;
 --------- ----------- 
| user_id | sum(`pv`) |
 --------- ----------- 
|       2 |         6 |
|       1 |         7 |
 --------- ----------- 
2 rows in set (0.04 sec)

MySQL [test]> explain select user_id,sum(pv) from test.user_visit group by user_id;
 ------------------------------------------------------------------------------ 
| Explain String                                                               |
 ------------------------------------------------------------------------------ 
| PLAN FRAGMENT 0                                                              |
|  OUTPUT EXPRS:<slot 2> `user_id` | <slot 3> sum(`pv`)                        |
|   PARTITION: UNPARTITIONED                                                   |
|                                                                              |
|   RESULT SINK                                                                |
|                                                                              |
|   2:EXCHANGE                                                                 |
|                                                                              |
| PLAN FRAGMENT 1                                                              |
|  OUTPUT EXPRS:                                                               |
|   PARTITION: HASH_PARTITIONED: `default_cluster:test`.`user_visit`.`user_id` |
|                                                                              |
|   STREAM DATA SINK                                                           |
|     EXCHANGE ID: 02                                                          |
|     UNPARTITIONED                                                            |
|                                                                              |
|   1:AGGREGATE (update finalize)                                              |
|   |  output: sum(`pv`)                                                       |
|   |  group by: `user_id`                                                     |
|   |  cardinality=-1                                                          |
|   |                                                                          |
|   0:OlapScanNode                                                             |
|      TABLE: user_visit                                                       |
|      PREAGGREGATION: ON                                                      |
|      partitions=1/1                                                          |
|      rollup: rollup_userid                                                   |
|      tabletRatio=8/8                                                         |
|      tabletList=10455,10457,10459,10461,10463,10465,10467,10469              |
|      cardinality=2                                                           |
|      avgRowSize=12.0                                                         |
|      numNodes=1                                                              |
 ------------------------------------------------------------------------------ 
31 rows in set (0.01 sec)

MySQL [test]> 

可以看到rollup: rollup_userid,表示查询我们刚才创建的ROLLUP

(4)未命中ROLLUP的例子

代码语言:javascript复制
MySQL [test]> explain select dt,sum(pv) from test.user_visit group by dt;
 ------------------------------------------------------------------------------ 
| Explain String                                                               |
 ------------------------------------------------------------------------------ 
| PLAN FRAGMENT 0                                                              |
|  OUTPUT EXPRS:<slot 2> `dt` | <slot 3> sum(`pv`)                             |
|   PARTITION: UNPARTITIONED                                                   |
|                                                                              |
|   RESULT SINK                                                                |
|                                                                              |
|   4:EXCHANGE                                                                 |
|                                                                              |
| PLAN FRAGMENT 1                                                              |
|  OUTPUT EXPRS:                                                               |
|   PARTITION: HASH_PARTITIONED: <slot 2> `dt`                                 |
|                                                                              |
|   STREAM DATA SINK                                                           |
|     EXCHANGE ID: 04                                                          |
|     UNPARTITIONED                                                            |
|                                                                              |
|   3:AGGREGATE (merge finalize)                                               |
|   |  output: sum(<slot 3> sum(`pv`))                                         |
|   |  group by: <slot 2> `dt`                                                 |
|   |  cardinality=-1                                                          |
|   |                                                                          |
|   2:EXCHANGE                                                                 |
|                                                                              |
| PLAN FRAGMENT 2                                                              |
|  OUTPUT EXPRS:                                                               |
|   PARTITION: HASH_PARTITIONED: `default_cluster:test`.`user_visit`.`user_id` |
|                                                                              |
|   STREAM DATA SINK                                                           |
|     EXCHANGE ID: 02                                                          |
|     HASH_PARTITIONED: <slot 2> `dt`                                          |
|                                                                              |
|   1:AGGREGATE (update serialize)                                             |
|   |  STREAMING                                                               |
|   |  output: sum(`pv`)                                                       |
|   |  group by: `dt`                                                          |
|   |  cardinality=-1                                                          |
|   |                                                                          |
|   0:OlapScanNode                                                             |
|      TABLE: user_visit                                                       |
|      PREAGGREGATION: ON                                                      |
|      partitions=1/1                                                          |
|      rollup: user_visit                                                      |
|      tabletRatio=8/8                                                         |
|      tabletList=10308,10310,10312,10314,10316,10318,10320,10322              |
|      cardinality=3                                                           |
|      avgRowSize=24.0                                                         |
|      numNodes=1                                                              |
 ------------------------------------------------------------------------------ 
47 rows in set (0.01 sec)

MySQL [test]> 

(5)查询已创建的ROLLUP

代码语言:javascript复制
MySQL [test]> SHOW ALTER TABLE ROLLUP FROM test G
*************************** 1. row ***************************
          JobId: 10453
      TableName: user_visit
     CreateTime: 2021-09-01 15:38:16
     FinishTime: 2021-09-01 15:38:36
  BaseIndexName: user_visit
RollupIndexName: rollup_userid
       RollupId: 10454
  TransactionId: 243
          State: FINISHED
            Msg: 
       Progress: NULL
        Timeout: 86400
1 row in set (0.01 sec)

MySQL [test]> 

0 人点赞