集合操作
大家好,我是老羊,今天我们来学习 Flink SQL 中的集合相关操作。
集合操作支持 BatchStreaming 任务。
- ⭐ UNION:将集合合并并且去重。
union
- ⭐ UNION ALL:将集合合并,不做去重。
Flink SQL> create view t1(s) as values ('c'), ('a'), ('b'), ('b'), ('c');
Flink SQL> create view t2(s) as values ('d'), ('e'), ('a'), ('b'), ('b');
Flink SQL> (SELECT s FROM t1) UNION (SELECT s FROM t2);
---
| s|
---
| c|
| a|
| b|
| d|
| e|
---
Flink SQL> (SELECT s FROM t1) UNION ALL (SELECT s FROM t2);
---
| c|
---
| c|
| a|
| b|
| b|
| c|
| d|
| e|
| a|
| b|
| b|
---
- ⭐ Intersect:交集并且去重
- ⭐ Intersect ALL:交集不做去重
Flink SQL> create view t1(s) as values ('c'), ('a'), ('b'), ('b'), ('c');
Flink SQL> create view t2(s) as values ('d'), ('e'), ('a'), ('b'), ('b');
Flink SQL> (SELECT s FROM t1) INTERSECT (SELECT s FROM t2);
---
| s|
---
| a|
| b|
---
Flink SQL> (SELECT s FROM t1) INTERSECT ALL (SELECT s FROM t2);
---
| s|
---
| a|
| b|
| b|
---
- ⭐ Except:差集并且去重
- ⭐ Except ALL:差集不做去重
Flink SQL> (SELECT s FROM t1) EXCEPT (SELECT s FROM t2);
---
| s |
---
| c |
---
Flink SQL> (SELECT s FROM t1) EXCEPT ALL (SELECT s FROM t2);
---
| s |
---
| c |
| c |
---
上述 SQL 在流式任务中,如果一条左流数据先来了,没有从右流集合数据中找到对应的数据时会直接输出,当右流对应数据后续来了之后,会下发回撤流将之前的数据給撤回。这也是一个回撤流。
- ⭐ In 子查询:这个大家比较熟悉了,但是注意,In 子查询的结果集只能有一列
SELECT user, amount
FROM Orders
WHERE product IN (
SELECT product FROM NewProducts
)
上述 SQL 的 In 子句其实就和之前介绍到的 Inner Join 类似。并且 In 子查询也会涉及到大状态问题,大家注意设置 State 的 TTL。