1.你们的数仓分层怎么分的,为什么要这么分
2.数据是如何接入的,流量日志是如何解析形成事实表的
3.kafka分区与group,consumer消费和partition是如何对应的
4.kafka数据积压如何处理
5.数据倾斜如何优化
6.hdfs小文件如何处理
7.数仓的数据域是如何划分的
8.dwd和dws的区别
9.数据承诺到岗时间如何保证
10.jvm的内存分析如何做
11.微服务中如果程序未抛异常并且没有打日志,如何排查问题
12.服务端被请求没有返回结果,如何排查问题
13.linux如何查看负载使用情况
14.linux命令使用多吗
15.linux如何看磁盘使用情况
16.数仓任务依赖中如何处理逆向依赖问题
17.元数据血缘关系是如何解析的
18.数据治理中,任务的数据倾斜是如何判断的
19.两个表字段:
表t1字段为用户id,用户姓名:
id,name
1,小王
2,小张
表t2字段为用户id,去过的城市:
id,city
1,bj
1,shanghai
2,shenzhen
求:查询没有去过北京的的人的名字。
代码语言:javascript复制select a.id,
a.name
from (
select id,
name
from (
select 1 as id,
'小王' as name
union all select 2 as id,
'小张' as name
) t1
) a
left join (
select id
from (
select 1 as id,
'bj' as city
union all select 1 as id,
'shanghai' as city
union all select 2 as id,
'shenzhen' as city
) t2
where city = 'bj'
group by 1
) b
on a.id = b.id
where b.id is null
20.表a两个字段:dt,dau
求:每一天求最近平均三天的dau。
代码语言:javascript复制select
date_add(dt, 2),
s_dau/3
from(
select dt,
dau,
rank() over(order by dt desc) rn,
sum(dau) over(order by dt desc) s_dau
from (
select '2021-01-01' as dt,
1 as dau
union all select '2021-01-02' as dt,
4 as dau
union all select '2021-01-03' as dt,
2 as dau
union all select '2021-01-04' as dt,
3 as dau
union all select '2021-01-05' as dt,
6 as dau
union all select '2021-01-06' as dt,
5 as dau
union all select '2021-01-07' as dt,
7 as dau
) a
group by dt,
dau) t
where rn % 3 = 0