构建方式
计划树构建是SQL处理的第三步,构建出可关系代数优化的逻辑计划树RelNode,是优化器执行优化的前提。
目前Calcite支持两种构建方式:
- RelBuilder构建:基于API调用,直接组装算子(例如,Filter,Join,Project等) 构建逻辑计划树,常用于测试及验证场景
- SqlToRelConverter构建:将AST抽象语法树(SqlNode)转换为逻辑计划树(RelNode),基于Visitor模式遍历各个SqlNode节点,生成对应的RelNode节点并维护在遍历空间中。完成遍历后,最终调用RelBuilder 组装成完整的RelNode计划树
RelBuilder 是 Calcite 的一个工具类,以编程方式构建关系代数表达式,即 RelNode 对象。它提供了一个流畅的API,允许开发者通过链式调用来构建复杂的查询计划。RelBuilder 的设计抽象了关系代数的操作,使得开发者无需直接操作底层的 RelNode 对象,从而简化了查询计划的构建过程。
SqlToRelConverter是Calcite的一个关键组件,实现SQL语句转换为RelNode逻辑计划树,基于Blackboard 用于DQL转换,遍历SqlNode节点并生成对应的RexNode表达式,基于Blackboard#setRoot维护最新的Root关系计划树。
构建流程
如图展示SqlToRelConverter构建流程:
1. 转换入口:convertQuery方法,基于SqlNode遍历执行convert转换操作
2. 遍历语法树:convertQueryRecursive方法,根据SqlKind类型,进行对应的convert转换操作,保证各类型SqlNode可转换为对应的RelNode,
- SELECT:convertSelect 方法,基于convertSelectImpl 转换RelNode
- WITH:convertWith 方法,基于with.body 回调convertQuery 方法
- INSERT:convertInsert 方法,基于insert.source 回调convertQueryRecursive 方法,insert.target转换为RelOptTable,拼装为TableModify 对象
- DELETE:convertDelete 方法,基于delete.source 回调 convertSelect 方法,拼装为TableModify 对象
- UPDATE:convertUpdate 方法,基于update.source 回调 convertSelect 方法,拼装为TableModify 对象
- MERGE:convertMerge 方法,基于merge.source 回调 convertSelect 方法,拼装为TableModify 对象
- 集合操作:convertSetOp 方法,基于convertQueryRecursive 分别遍历集合左右节点,封装为集合对象
3. 以SELECT类型为例,遍历前生成Blackboard维护关系表达式,然后根据SELECT对应的SqlNode树结构进行遍历,依次遍历From节点、Where查询条件,SelectList查询字段等
如图展示从SqlNode转为RelNode的对象变换:select name, age from student where name = 'add'
构建示例
以下展示基于RelBuilder构建RelNode的相关示例。
RelBuilder准备
步骤1:手动初始化Schema定义,创建表test.trace、test.calllog
代码语言:java复制AbstractTable trace = new AbstractTable() {
public RelDataType getRowType(RelDataTypeFactory typeFactory) {
return typeFactory.builder()
.add("imp_date", SqlTypeName.VARCHAR)
.add("job_time", SqlTypeName.VARCHAR)
.add("f_level", SqlTypeName.VARCHAR)
.add("f_caller", SqlTypeName.VARCHAR)
.add("f_cid", SqlTypeName.VARCHAR)
.add("f_message", SqlTypeName.VARCHAR)
.add("f_slotid", SqlTypeName.VARCHAR)
.add("f_uid", SqlTypeName.VARCHAR)
.build();
}
};
AbstractTable calllog = new AbstractTable() {
public RelDataType getRowType(RelDataTypeFactory typeFactory) {
return typeFactory.builder()
.add("imp_date", SqlTypeName.VARCHAR)
.add("job_time", SqlTypeName.VARCHAR)
.add("f_exceedquota", SqlTypeName.VARCHAR)
.add("f_bill_sequence", SqlTypeName.VARCHAR)
.build();
}
};
CalciteSchema rootSchema = CalciteSchema.createRootSchema(true);
CalciteSchema schema = rootSchema.add("test", new AbstractSchema());
schema.add("trace", trace);
schema.add("calllog", calllog);
步骤2:基于FrameworkConfig,生成Relbuilder对象
代码语言:java复制Frameworks.newConfigBuilder()
.defaultSchema(schema.plus())
.build()
RelBuilder builder = RelBuilder.create(config());
算子构建
1. TableScan算子
构建出叶子节点,关系(表)对象
代码语言:java复制RelNode node = builder.scan(TRACE_TABLE).build();
其中,TRACE_TABLE 基于列表(List)定义表的Schema名称
代码语言:java复制List<String> TRACE_TABLE = ImmutableList.of("test", "trace");
2. Project算子
选择指定列的投影操作
代码语言:java复制RelNode node = builder
.scan(TRACE_TABLE)
.project(demoProjects(builder))
.build();
其中,Project投影字段如下,选择5个字段查询
代码语言:java复制private static List<RexNode> demoProjects(RelBuilder builder) {
return ImmutableList.of(
builder.field("f_cid"),
builder.field("f_message"),
builder.field("f_slotid"),
builder.field("f_uid"),
builder.field("imp_date")
);
}
3. Filter算子
带有过滤条件的选择操作
3.1:IN查询
代码语言:java复制 RelNode node = builder
.scan(TRACE_TABLE)
.filter(demoConditionIn(builder))
.build();
其中,基于Filter的IN过滤条件如下所示:基于分区字段imp_date 按照小时分区过滤
代码语言:java复制private static RexNode demoConditionIn(RelBuilder builder) {
return builder.call(SqlStdOperatorTable.IN,
builder.field("imp_date"),
builder.literal("p_2024030400"),
builder.literal("p_2024030401"),
builder.literal("p_2024030402")
);
}
3.2:OR查询,多个条件或运算
代码语言:java复制 RelNode node = builder
.scan(TRACE_TABLE)
.filter(demoConditionOr(builder))
.build();
其中,基于Filter的OR过滤条件如下所示:基于分区字段imp_date 选择指定分区
代码语言:java复制private static RexNode demoConditionOr(RelBuilder builder) {
return builder.or(
builder.equals(builder.field("imp_date"), builder.literal("p_2024030400")),
builder.equals(builder.field("imp_date"), builder.literal("p_2024030401")),
builder.equals(builder.field("imp_date"), builder.literal("p_2024030402"))
);
}
3.3:AND查询,多个条件与运算
代码语言:java复制 RelNode node = builder
.scan(TRACE_TABLE)
.filter(demoConditionAnd(builder))
.build();
其中,基于Filter的AND过滤条件如下所示:同时满足imp_date分区选择和job_time任务构建时间选择
代码语言:java复制private static RexNode demoConditionAnd(RelBuilder builder) {
return builder.and(
builder.equals(builder.field("imp_date"), builder.literal("p_2024030400")),
builder.equals(builder.field("job_time"), builder.literal("2024030400")),
);
}
4. Filter Project算子
代码语言:java复制 RelNode node = builder
.scan(TRACE_TABLE)
.filter(demoConditionIn(builder))
.project(demoProjects(builder))
.build();
关系代数等价于,如下Project Filter算子
代码语言:java复制 RelNode node = builder
.scan(TRACE_TABLE)
.project(demoProjects(builder))
.filter(demoConditionIn(builder))
.build();
5. 多条件Filter算子
多条件Filter算子多条件且带有Function计算
代码语言:java复制 RelNode node = builder
.scan(TRACE_TABLE)
.filter(demoConditionIn(builder))
.project(demoProjects(builder))
.filter(demoMultiConditions(builder))
.build();
其中,多条件过滤定义如下:
代码语言:java复制private static RexNode demoMultiConditions(RelBuilder builder) {
// Filter condition is not null
RexNode isNotNull = builder.isNotNull(builder.field("f_slotid"));
// Defined of udf
RexNode substring = builder.call(SqlStdOperatorTable.SUBSTRING,
builder.field("imp_date"),
builder.literal(1),
builder.literal(8)
);
// Filter condition udf
RexNode substringCondition = builder.call(SqlStdOperatorTable.IN,
substring,
builder.literal("20240304"),
builder.literal("20240305"),
builder.literal("20240306")
);
// Filter condition equals
RexNode eqCondition = builder.equals(builder.field("f_slotid"),
builder.literal("4003158499171286"));
return builder.and(isNotNull, substringCondition, eqCondition);
}
6. 多条件Project算子
多条件Project且带有Function计算,且带有字段别名
代码语言:java复制 RelNode node = builder
.scan(TRACE_TABLE)
.filter(demoConditionIn(builder))
.project(demoProjects(builder))
.filter(demoMultiConditions(builder))
.project(demoAliasAndUdfProjects(builder))
.build();
其中,多条件映射定义如下:
代码语言:java复制 private static List<RexNode> demoAliasAndUdfProjects(RelBuilder builder) {
// Project field with function
RexNode urlDecode = builder.call(
new SqlFunction(
"url_decode", // function name
SqlKind.OTHER_FUNCTION, // function type
ReturnTypes.VARCHAR_2000, // function return type
null, // Infer type
OperandTypes.STRING, // function parameter
SqlFunctionCategory.USER_DEFINED_FUNCTION),
builder.field("f_message"),
builder.literal("utf-8")
);
// Function with Struct Json field
SqlFunction jsonFunc = new SqlFunction(
"get_json_object", // function name
SqlKind.OTHER_FUNCTION, // function type
ReturnTypes.INTEGER, // function return type
null, // Infer type
OperandTypes.NUMERIC, // function parameter
SqlFunctionCategory.USER_DEFINED_FUNCTION);
RexNode jsonField = builder.call(jsonFunc,
urlDecode,
builder.literal("$.billRsp"));
return ImmutableList.of(
builder.alias(builder.field("f_uid"), "advertiser_id"),
builder.alias(builder.field("f_cid"), "cid"),
builder.alias(jsonField, "billno")
);
}
7. Sort算子
排序和Limit处理
代码语言:java复制 RelNode node = builder
.scan(TRACE_TABLE)
.filter(demoConditionIn(builder))
.project(demoProjects(builder))
.filter(demoMultiConditions(builder))
.project(demoAliasAndUdfProjects(builder))
.sort(demoFunctionSort(builder))
.build();
其中,带有Function的Sort排序字段处理定义如下,定义两个排序字段:
代码语言:java复制private static List<RexNode> demoFunctionSort(RelBuilder builder) {
RexNode coalesce = builder.call(SqlStdOperatorTable.COALESCE,
builder.field("billno"),
builder.literal(0)
);
RexNode notNull = builder.isNotNull(builder.field("billno"));
return ImmutableList.of(coalesce, builder.desc(notNull));
}
8. Join算子
定义关联查询
代码语言:java复制// left node
builder
.scan(TRACE_TABLE)
.filter(demoConditionIn(builder))
.project(demoProjects(builder))
.filter(demoMultiConditions(builder))
.project(demoAliasAndUdfProjects(builder))
.sort(demoFunctionSort(builder));
// right node
builder.scan(CALL_TABLE)
// join condition
.join(JoinRelType.INNER, builder.equals(
builder.field(2, 0, "billno"),
builder.field(2, 1, "f_bill_sequence")));
最终构建出的计划树如图所示:
基于RelNode转SqlNode,对应生成的SQL语句如下:
代码语言:java复制SELECT *
FROM (SELECT `advertiser_id`, `cid`, `billno`
FROM (SELECT `advertiser_id`, `cid`, `billno`, `billno` `billno0`, TRUE `_f4`
FROM (SELECT `f_uid` `advertiser_id`, `f_cid` `cid`, GET_JSON_OBJECT(URL_DECODE(`f_message`, 'utf-8'), '$.billRsp') `billno`
FROM (SELECT `f_cid`, `f_message`, `f_slotid`, `f_uid`, `imp_date`
FROM `test`.`trace`
WHERE (`imp_date` IN ('p_2024030400', 'p_2024030401', 'p_2024030402'))) `t0`
WHERE (SUBSTRING(`imp_date`, 1, 8) IN ('20240304', '20240305', '20240306')) AND `f_slotid` = '4003158499171286') `t2`
ORDER BY `billno0` NULLS LAST, `_f4` DESC NULLS FIRST) `t4`) `t5`
INNER JOIN `test`.`calllog` ON `t5`.`billno` = `calllog`.`f_bill_sequence`
我正在参与2024腾讯技术创作特训营最新征文,快来和我瓜分大奖!