Calcite系列(八):执行流程-计划树构建

2024-05-04 19:54:17 浏览数 (3)

构建方式

计划树构建是SQL处理的第三步,构建出可关系代数优化的逻辑计划树RelNode,是优化器执行优化的前提。

目前Calcite支持两种构建方式:

  1. RelBuilder构建:基于API调用,直接组装算子(例如,Filter,Join,Project等) 构建逻辑计划树,常用于测试及验证场景
  2. SqlToRelConverter构建:将AST抽象语法树(SqlNode)转换为逻辑计划树(RelNode),基于Visitor模式遍历各个SqlNode节点,生成对应的RelNode节点并维护在遍历空间中。完成遍历后,最终调用RelBuilder 组装成完整的RelNode计划树

RelBuilder 是 Calcite 的一个工具类,以编程方式构建关系代数表达式,即 RelNode 对象。它提供了一个流畅的API,允许开发者通过链式调用来构建复杂的查询计划。RelBuilder 的设计抽象了关系代数的操作,使得开发者无需直接操作底层的 RelNode 对象,从而简化了查询计划的构建过程。

SqlToRelConverter是Calcite的一个关键组件,实现SQL语句转换为RelNode逻辑计划树,基于Blackboard 用于DQL转换,遍历SqlNode节点并生成对应的RexNode表达式,基于Blackboard#setRoot维护最新的Root关系计划树。

构建流程

如图展示SqlToRelConverter构建流程:

1. 转换入口:convertQuery方法,基于SqlNode遍历执行convert转换操作

2. 遍历语法树:convertQueryRecursive方法,根据SqlKind类型,进行对应的convert转换操作,保证各类型SqlNode可转换为对应的RelNode,

  • SELECT:convertSelect 方法,基于convertSelectImpl 转换RelNode
  • WITH:convertWith 方法,基于with.body 回调convertQuery 方法
  • INSERT:convertInsert 方法,基于insert.source 回调convertQueryRecursive 方法,insert.target转换为RelOptTable,拼装为TableModify 对象
  • DELETE:convertDelete 方法,基于delete.source 回调 convertSelect 方法,拼装为TableModify 对象
  • UPDATE:convertUpdate 方法,基于update.source 回调 convertSelect 方法,拼装为TableModify 对象
  • MERGE:convertMerge 方法,基于merge.source 回调 convertSelect 方法,拼装为TableModify 对象
  • 集合操作:convertSetOp 方法,基于convertQueryRecursive 分别遍历集合左右节点,封装为集合对象

3. 以SELECT类型为例,遍历前生成Blackboard维护关系表达式,然后根据SELECT对应的SqlNode树结构进行遍历,依次遍历From节点、Where查询条件,SelectList查询字段等

如图展示从SqlNode转为RelNode的对象变换:select name, age from student where name = 'add'

构建示例

以下展示基于RelBuilder构建RelNode的相关示例。

RelBuilder准备

步骤1:手动初始化Schema定义,创建表test.trace、test.calllog

代码语言:java复制
AbstractTable trace = new AbstractTable() {
  public RelDataType getRowType(RelDataTypeFactory typeFactory) {
    return typeFactory.builder()
        .add("imp_date", SqlTypeName.VARCHAR)
        .add("job_time", SqlTypeName.VARCHAR)
        .add("f_level", SqlTypeName.VARCHAR)
        .add("f_caller", SqlTypeName.VARCHAR)
        .add("f_cid", SqlTypeName.VARCHAR)
        .add("f_message", SqlTypeName.VARCHAR)
        .add("f_slotid", SqlTypeName.VARCHAR)
        .add("f_uid", SqlTypeName.VARCHAR)
        .build();
  }
};

AbstractTable calllog = new AbstractTable() {
  public RelDataType getRowType(RelDataTypeFactory typeFactory) {
    return typeFactory.builder()
        .add("imp_date", SqlTypeName.VARCHAR)
        .add("job_time", SqlTypeName.VARCHAR)
        .add("f_exceedquota", SqlTypeName.VARCHAR)
        .add("f_bill_sequence", SqlTypeName.VARCHAR)
        .build();
  }
};

CalciteSchema rootSchema = CalciteSchema.createRootSchema(true);
CalciteSchema schema = rootSchema.add("test", new AbstractSchema());
schema.add("trace", trace);
schema.add("calllog", calllog);

步骤2:基于FrameworkConfig,生成Relbuilder对象

代码语言:java复制
Frameworks.newConfigBuilder()
        .defaultSchema(schema.plus())
        .build()
RelBuilder builder = RelBuilder.create(config());

算子构建

1. TableScan算子

构建出叶子节点,关系(表)对象

代码语言:java复制
RelNode node = builder.scan(TRACE_TABLE).build();

其中,TRACE_TABLE 基于列表(List)定义表的Schema名称

代码语言:java复制
List<String> TRACE_TABLE = ImmutableList.of("test", "trace");

2. Project算子

选择指定列的投影操作

代码语言:java复制
RelNode node = builder
        .scan(TRACE_TABLE)
        .project(demoProjects(builder))
        .build();

其中,Project投影字段如下,选择5个字段查询

代码语言:java复制
private static List<RexNode> demoProjects(RelBuilder builder) {
  return ImmutableList.of(
      builder.field("f_cid"),
      builder.field("f_message"),
      builder.field("f_slotid"),
      builder.field("f_uid"),
      builder.field("imp_date")
  );
}

3. Filter算子

带有过滤条件的选择操作

3.1:IN查询

代码语言:java复制
 RelNode node = builder
        .scan(TRACE_TABLE)
        .filter(demoConditionIn(builder))
        .build();

其中,基于Filter的IN过滤条件如下所示:基于分区字段imp_date 按照小时分区过滤

代码语言:java复制
private static RexNode demoConditionIn(RelBuilder builder) {
  return builder.call(SqlStdOperatorTable.IN,
      builder.field("imp_date"),
      builder.literal("p_2024030400"),
      builder.literal("p_2024030401"),
      builder.literal("p_2024030402")
  );
}

3.2:OR查询,多个条件或运算

代码语言:java复制
 RelNode node = builder
        .scan(TRACE_TABLE)
        .filter(demoConditionOr(builder))
        .build();

其中,基于Filter的OR过滤条件如下所示:基于分区字段imp_date 选择指定分区

代码语言:java复制
private static RexNode demoConditionOr(RelBuilder builder) {
  return builder.or(
      builder.equals(builder.field("imp_date"), builder.literal("p_2024030400")),
      builder.equals(builder.field("imp_date"), builder.literal("p_2024030401")),
      builder.equals(builder.field("imp_date"), builder.literal("p_2024030402"))
  );
}

3.3:AND查询,多个条件与运算

代码语言:java复制
 RelNode node = builder
        .scan(TRACE_TABLE)
        .filter(demoConditionAnd(builder))
        .build();

其中,基于Filter的AND过滤条件如下所示:同时满足imp_date分区选择和job_time任务构建时间选择

代码语言:java复制
private static RexNode demoConditionAnd(RelBuilder builder) {
  return builder.and(
      builder.equals(builder.field("imp_date"), builder.literal("p_2024030400")),
      builder.equals(builder.field("job_time"), builder.literal("2024030400")),
  );
}

4. Filter Project算子

代码语言:java复制
 RelNode node = builder
        .scan(TRACE_TABLE)
        .filter(demoConditionIn(builder))
        .project(demoProjects(builder))
        .build();

关系代数等价于,如下Project Filter算子

代码语言:java复制
 RelNode node = builder
        .scan(TRACE_TABLE)
        .project(demoProjects(builder))
        .filter(demoConditionIn(builder))
        .build();

5. 多条件Filter算子

多条件Filter算子多条件且带有Function计算

代码语言:java复制
 RelNode node = builder
        .scan(TRACE_TABLE)
        .filter(demoConditionIn(builder))
        .project(demoProjects(builder))
        .filter(demoMultiConditions(builder))
        .build();

其中,多条件过滤定义如下:

代码语言:java复制
private static RexNode demoMultiConditions(RelBuilder builder) {
// Filter condition is not null
RexNode isNotNull = builder.isNotNull(builder.field("f_slotid"));

// Defined of udf
RexNode substring = builder.call(SqlStdOperatorTable.SUBSTRING,
    builder.field("imp_date"),
    builder.literal(1),
    builder.literal(8)
);
// Filter condition udf
RexNode substringCondition = builder.call(SqlStdOperatorTable.IN,
    substring,
    builder.literal("20240304"),
    builder.literal("20240305"),
    builder.literal("20240306")
);
// Filter condition equals
RexNode eqCondition = builder.equals(builder.field("f_slotid"),
    builder.literal("4003158499171286"));
return builder.and(isNotNull, substringCondition, eqCondition);
}

6. 多条件Project算子

多条件Project且带有Function计算,且带有字段别名

代码语言:java复制
 RelNode node = builder
        .scan(TRACE_TABLE)
        .filter(demoConditionIn(builder))
        .project(demoProjects(builder))
        .filter(demoMultiConditions(builder))
        .project(demoAliasAndUdfProjects(builder))
        .build();

其中,多条件映射定义如下:

代码语言:java复制
  private static List<RexNode> demoAliasAndUdfProjects(RelBuilder builder) {
    // Project field with function
    RexNode urlDecode = builder.call(
        new SqlFunction(
            "url_decode", // function name
            SqlKind.OTHER_FUNCTION, // function type
            ReturnTypes.VARCHAR_2000, // function return type
            null, // Infer type
            OperandTypes.STRING, // function parameter
            SqlFunctionCategory.USER_DEFINED_FUNCTION),
        builder.field("f_message"),
        builder.literal("utf-8")
    );
    // Function with Struct Json field
    SqlFunction jsonFunc = new SqlFunction(
        "get_json_object", // function name
        SqlKind.OTHER_FUNCTION, // function type
        ReturnTypes.INTEGER, // function return type
        null, // Infer type
        OperandTypes.NUMERIC, // function parameter
        SqlFunctionCategory.USER_DEFINED_FUNCTION);
    RexNode jsonField = builder.call(jsonFunc,
        urlDecode,
        builder.literal("$.billRsp"));
    return ImmutableList.of(
        builder.alias(builder.field("f_uid"), "advertiser_id"),
        builder.alias(builder.field("f_cid"), "cid"),
        builder.alias(jsonField, "billno")
    );
  }

7. Sort算子

排序和Limit处理

代码语言:java复制
 RelNode node = builder
        .scan(TRACE_TABLE)
        .filter(demoConditionIn(builder))
        .project(demoProjects(builder))
        .filter(demoMultiConditions(builder))
        .project(demoAliasAndUdfProjects(builder))
        .sort(demoFunctionSort(builder))
        .build();

其中,带有Function的Sort排序字段处理定义如下,定义两个排序字段:

代码语言:java复制
private static List<RexNode> demoFunctionSort(RelBuilder builder) {
  RexNode coalesce = builder.call(SqlStdOperatorTable.COALESCE,
      builder.field("billno"),
      builder.literal(0)
  );
  RexNode notNull = builder.isNotNull(builder.field("billno"));
  return ImmutableList.of(coalesce, builder.desc(notNull));
}

8. Join算子

定义关联查询

代码语言:java复制
// left node
builder
    .scan(TRACE_TABLE)
    .filter(demoConditionIn(builder))
    .project(demoProjects(builder))
    .filter(demoMultiConditions(builder))
    .project(demoAliasAndUdfProjects(builder))
    .sort(demoFunctionSort(builder));
// right node
builder.scan(CALL_TABLE)
// join condition
    .join(JoinRelType.INNER,  builder.equals(
            builder.field(2, 0, "billno"), 
            builder.field(2, 1, "f_bill_sequence")));

最终构建出的计划树如图所示:

基于RelNode转SqlNode,对应生成的SQL语句如下:

代码语言:java复制
SELECT *
FROM (SELECT `advertiser_id`, `cid`, `billno`
FROM (SELECT `advertiser_id`, `cid`, `billno`, `billno` `billno0`, TRUE `_f4`
FROM (SELECT `f_uid` `advertiser_id`, `f_cid` `cid`, GET_JSON_OBJECT(URL_DECODE(`f_message`, 'utf-8'), '$.billRsp') `billno`
FROM (SELECT `f_cid`, `f_message`, `f_slotid`, `f_uid`, `imp_date`
FROM `test`.`trace`
WHERE (`imp_date` IN ('p_2024030400', 'p_2024030401', 'p_2024030402'))) `t0`
WHERE (SUBSTRING(`imp_date`, 1, 8) IN ('20240304', '20240305', '20240306')) AND `f_slotid` = '4003158499171286') `t2`
ORDER BY `billno0` NULLS LAST, `_f4` DESC NULLS FIRST) `t4`) `t5`
INNER JOIN `test`.`calllog` ON `t5`.`billno` = `calllog`.`f_bill_sequence`

我正在参与2024腾讯技术创作特训营最新征文,快来和我瓜分大奖!

0 人点赞