Hive
Hive 计算数组交并差函数
代码语言:javascript复制select
array_intersect(array(1, 2), array(2, 3)) i,
array_union(array(1, 2), array(2, 3)) u,
array_except(array(1, 2), array(2, 3)) e;
SQL 实例:
代码语言:javascript复制select
size(t.res) as cnt
from
(
select
array_intersect(
(
select
collect_set(user_id)
from
ecom_kunlun.crm_s_u
where
date = '20220315'
and shop_id = 7571122
and pay_90d > '0'
),
(
select
collect_set(user_id)
from
ecom_kunlun.crm_s_u
where
date = '20220315'
and shop_id = 7571122
and member_level in ('1', '2', '3', '4', '5', '6')
)
) as res
) t;
ClickHouse
代码语言:javascript复制select
length(
arrayDistinct(arrayFilter(x -> x is not null, t.res))
) as cnt
from
(
select
arrayIntersect(
(
select
groupUniqArray(user_id)
from
ecom_dmp_shop_user.crm_s_u
where
p_date = '2022-03-15'
and shop_id = 7571122
and pay_90d > '0'
),
(
select
groupUniqArray(user_id)
from
ecom_dmp_shop_user.crm_s_u
where
p_date = '2022-03-15'
and shop_id = 7571122
and member_level in ('1', '2', '3', '4', '5', '6')
)
) as res
) t;
ClickHouse 实现数组交并差计算
计算数组交集
函数:arrayIntersect(arr1,arr2,...) 功能说明:计算 arr1,arr2等数组元素交集。 返回值:交集元素子数组。 SQL 实例:
代码语言:javascript复制SELECT
arrayIntersect([1, 2, 3], [4, 5, 6]) AS noIntersect,
arrayIntersect([1, 2, 3], [2, 3, 4, 5, 6]) AS hasIntersect
FORMAT Vertical
输出: noIntersect: [] hasIntersect: [3,2]
计算数组并集
组合使用函数来实现arrayDistinct(arrayConcat(a, b))。 SQL 实例:
代码语言:javascript复制SELECT
[1, 2] AS a,
[2, 3] AS b,
arrayDistinct(arrayConcat(a, b)) AS res
FORMAT Vertical
输出: a: [1,2] b: [2,3] res: [1,2,3]
计算数组差集
差集的实现要有一些技术含量了(感觉 ClickHouse 后面应该内置数组差集计算函数,实现类似arrayExcept() 函数),需要使用数组交集函数arrayIntersect() 结合高阶函数 arrayMap()和 arrayFilter()来组合实现。 SQL 实例:
代码语言:javascript复制SELECT
arrayIntersect([1, 2, 3], [4, 5, 6]) AS noIntersect,
arrayIntersect([1, 2, 3], [2, 3, 4, 5, 6]) AS hasIntersect
FORMAT Vertical
SELECT
[1, 2] AS a,
[2, 3] AS b,
arrayFilter(x -> (x IS NOT NULL), arrayMap(x -> multiIf(x NOT IN arrayIntersect(a, b), x, NULL), a)) AS res
FORMAT Vertical
输出: a: [1,2] b: [2,3] res: [1]
使用 交(INTERSECT)、并(UNION)、差(EXCEPT)的SQL子句关键字 实现
另外, ClickHouse 中有集合交(INTERSECT)、并(UNION)、差(EXCEPT)的SQL子句关键字,可以实现数组的交并差运算。实例 SQL 如下。
交集SQL:
代码语言:javascript复制SELECT a.i
FROM
(
SELECT arrayJoin([1, 2]) AS i
) AS a
INTERSECT
SELECT b.i
FROM
(
SELECT arrayJoin([2, 3]) AS i
) AS b
输出:2
并集 SQL:
代码语言:javascript复制SET union_default_mode = 'ALL';
SELECT DISTINCT t.i
FROM
(
SELECT a.i
FROM
(
SELECT arrayJoin([1, 2]) AS i
) AS a
UNION
SELECT b.i
FROM
(
SELECT arrayJoin([2, 3]) AS i
) AS b
) AS t
输出: 1 2 3
差集 SQL:
代码语言:javascript复制SELECT a.i
FROM
(
SELECT arrayJoin([1, 2]) AS i
) AS a
EXCEPT
SELECT b.i
FROM
(
SELECT arrayJoin([2, 3]) AS i
) AS b
输出:1
上述[1,2]和[2,3]分别表示要求差集的数组
数据函数解释: arrayIntersect表示多个数组求交集 arrayConcat表示连接多个数组为一个数组 arrayDistinct表示将多个数组元素去重组合为一个数组 arrayMap表示使用lambda函数转换 arrayFilter表示使用lambda函数过滤
参考资料:
https://clickhouse.tech/docs/zh/sql-reference/functions/higher-order-functions/