关于留存的计算是我们分享业务走向一种非常重要的方式,所谓留存用户,是用户在某一段时间内开始使用某个功能,到后面某一段时间又使用这个应用的用户,我们称这批用户叫留存用户。后面我们定义留存其实更多是基于两个事件来,比如某一天完成a事件,然后又完成b事件,这个我们叫b的留存,当然我们这个可以基于用户也可以基于设备。
下面我们来看clickhouse实现留存,关于clickhouse有个专用的留存函数,来处理留存用户数。
代码语言:javascript复制语法
retention(cond1, cond2, ..., cond32);
参数
cond — 返回 UInt8结果(1或0)的表达式。
返回值
数组为1或0。
1 — 条件满足。
0 — 条件不满足。
下面我们来看例子: 创建表:
代码语言:javascript复制CREATE TABLE reg(date Date, uid Int32,channel Int32) ENGINE = Memory;
CREATE TABLE charge(date Date, uid Int32, money Int32) ENGINE = Memory;
# 插入数据:
INSERT INTO reg SELECT '2020-01-01', 1,10000;
INSERT INTO reg SELECT '2020-01-01', 2,10001;
INSERT INTO reg SELECT '2020-01-01', 3,10001;
INSERT INTO reg SELECT '2020-01-02', 4,10000;
INSERT INTO reg SELECT '2020-01-03', 5,10000;
INSERT INTO charge SELECT '2020-01-01', 1,10;
INSERT INTO charge SELECT '2020-01-01', 1,10;
INSERT INTO charge SELECT '2020-01-01', 2,200;
INSERT INTO charge SELECT '2020-01-01', 3,500;
INSERT INTO charge SELECT '2020-01-02', 4,10000;
INSERT INTO charge SELECT '2020-01-02', 2,300;
INSERT INTO charge SELECT '2020-01-02', 3,400;
INSERT INTO charge SELECT '2020-01-02', 3,400;
INSERT INTO charge SELECT '2020-01-03', 5,10000;
INSERT INTO charge SELECT '2020-01-03', 3,400;
假如查询2020-01-01日的0日,1日,2日注册到充值留存数,我们可以先看下uid的每个条件的实现结果:
代码语言:javascript复制SELECT
uid,
retention((date = '2020-01-01') AND (type = 1), (date = '2020-01-01') AND (type = 2), (date = '2020-01-02') AND (type = 2), (date = '2020-01-03') AND (type = 2)) AS r
FROM
(
SELECT
uid,
date,
1 AS type
FROM reg
WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')
UNION ALL
SELECT
uid,
date,
2 AS type
FROM charge
WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')
)
WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')
GROUP BY uid
ORDER BY uid ASC
┌─uid─┬─r─────────┐
│ 1 │ [1,1,0,0] │
│ 2 │ [1,1,1,0] │
│ 3 │ [1,1,1,1] │
│ 4 │ [0,0,0,0] │
│ 5 │ [0,0,0,0] │
└─────┴───────────┘
对一下上面我们的写入数据,符合我们的结果。 下面我们写个完整sql实现这个留存功能:
代码语言:javascript复制SELECT
sum(r[1]) AS a,
sum(r[2]) AS b,
sum(r[3]) AS c,
sum(r[4]) AS d
FROM
(
SELECT
uid,
retention((date = '2020-01-01') AND (type = 1), (date = '2020-01-01') AND (type = 2), (date = '2020-01-02') AND (type = 2), (date = '2020-01-03') AND (type = 2)) AS r
FROM
(
SELECT
uid,
date,
1 AS type
FROM reg
WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')
UNION ALL
SELECT
uid,
date,
2 AS type
FROM charge
WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')
)
WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')
GROUP BY uid
ORDER BY uid ASC
)
┌─a─┬─b─┬─c─┬─d─┐
│ 3 │ 3 │ 2 │ 1 │
└───┴───┴───┴───┘
上面分别是 a:2020-01-01 注册数 b:2020-01-01 注册并充值人数(新用户充值数) c:2020-01-02 注册并充值的1日留存数 d:2020-01-03 注册并充值的2日留存数
下面我们看下如何计算活跃,上面的例子需要多插入几条数据,我们需要计算3日活跃。
代码语言:javascript复制INSERT INTO charge SELECT '2020-01-04', 3,400;
.
计算活跃sql:
代码语言:javascript复制SELECT
sum(r[1]) AS a,
sum(r[2]) AS b,
sum(r[3]) AS c,
sum(r[4]) AS d
FROM
(
SELECT
uid,
retention((date = '2020-01-01') AND (type = 1), (date = '2020-01-01') AND (type = 2), (date >= '2020-01-01') AND (date <= '2020-01-02') AND (type = 2), (date >= '2020-01-02') AND (date <= '2020-01-04') AND (type = 2)) AS r
FROM
(
SELECT
uid,
date,
1 AS type
FROM reg
WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')
UNION ALL
SELECT
uid,
date,
2 AS type
FROM charge
WHERE (date >= '2020-01-01') AND (date <= '2020-01-04')
)
WHERE (date >= '2020-01-01') AND (date <= '2020-01-04')
GROUP BY uid
ORDER BY uid ASC
)
┌─a─┬─b─┬─c─┬─d─┐
│ 3 │ 3 │ 3 │ 2 │
└───┴───┴───┴───┘
上面分别是 a:2020-01-01 注册数 b:2020-01-01 注册并充值人数活跃数 c:2020-01-02 注册并充值的1日活跃数 d:2020-01-03 注册并充值的2日活跃数
关于留存率或者活跃率我没有写例子,其实就是上面的数据b,c,d除以a,a是基准数据。
留存计算一般其实是针对于一天来计算 活跃是算一段时间的累计数