一、背景
完成了spark on k8s的部署和测试,现在需要一个能够查看spark任务执行情况的ui,原先采用yarn资源管理器ui链接到spark-web-ui,由于yarn集群下的机器ip固定,可以通过配置本地代理的方式访问它,现在去掉了yarn,自己需要搭建一个能够查看所有spark任务执行情况的页面。直接使用spark-web-ui不方便管理且部署的driver机器在线上且ip不固定,无法通过配置代理和服务名方式打通。
二、Spark History Server
1、原理
1、spark history server读取spark任务执行过程中产生的eventlog,来还原spark-web-ui
2、spark history server能够展示正在执行和执行完的spark任务的ui,通过eventlog日志文件后缀名.inprogress区分
3、spark history server解决了在不使用代理的情况下,能够查看线上正在执行任务的spark-web-ui,只要给部署spark history server服务配一个办公网的域名即可,原因是它只是通过eventlog近实时还原spark web ui。日志更新时间,参照该配置
代码语言:javascript复制spark.history.fs.update.interval 10s (默认10秒)
2、部署
由于打算把spark history server部署在k8s的容器上,需要一个在前台运行的程序来启动spark history server,spark提供的spark/sbin/start-history-server.sh是通过起一个后台进程去跑,所以我们要改造一下
改造完并使用configmap挂载配置的spark history server的yaml如下:
代码语言:javascript复制apiVersion: v1
kind: Service
metadata:
name: spark-history-service
labels:
run: spark-history-service
spec:
ports:
- port: 80
protocol: TCP
name: http
- port: 18080
protocol: TCP
name: spark-history
selector:
run: spark-history-service
---
apiVersion: v1
kind: ConfigMap
metadata:
name: spark-history-defaults
namespace: kyuubi
data:
sparkDefaults: |
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.access.key=XXXXXXXXXXXXXXXXX
spark.hadoop.fs.s3a.secret.key=XXXXXXXXXXXXXXXXX
spark.hadoop.fs.s3a.connection.ssl.enabled=false
spark.hadoop.fs.s3a.endpoint=http://s3.ap-northeast-1.amazonaws.com
spark.hadoop.fs.s3a.path.style.access=true
spark.eventLog.dir=s3a://mybucket/sparkOnK8s/eventLogDir
spark.history.fs.logDirectory=s3a://mybucket/sparkOnK8s/eventLogDir
spark.history.fs.cleaner.enabled=true
spark.eventLog.compress=true
spark.kubernetes.authenticate.driver.serviceAccountName=default
spark.kubernetes.file.upload.path=s3a://mybucket/sparkOnK8s/kubernetes/file/upload
spark.history.fs.cleaner.enabled=true
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=7d
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: spark-history-service
spec:
selector:
matchLabels:
run: spark-history-service
replicas: 1
template:
metadata:
labels:
run: spark-history-service
spec:
containers:
- name: spark-history-service
image: XXXXX/XXXXXX:spark_history3.2.1_v1.0.0
volumeMounts:
- name: settings
mountPath: /usr/local/spark/conf/spark-defaults.conf
subPath: sparkDefaults
command: ["/usr/local/spark/bin/spark-class"]
args: ["org.apache.spark.deploy.history.HistoryServer"]
volumes:
- name: settings
configMap:
name: spark-history-defaults
XXXXX/XXXXXX:spark_history3.2.1_v1.0.0 是我打包上传到仓库的镜像,Dockerfile如下:
代码语言:javascript复制FROM openjdk:8u332-slim-buster
RUN apt-get update
&& apt-get install -y procps vim net-tools iputils-ping
COPY spark-3.2.1-bin-hadoop3.3.1 /usr/local/spark
3、启动
启动spark history server pod,并提交一个spark on k8s任务,任务正在过程中,spark-history-ui并没有展示正在执行的任务,查看s3a://mybucket/sparkOnK8s/eventLogDir目录发现并没有后缀名.inprogress的文件,等执行完spark任务后才产生文件,只能看到执行完任务的历史。
4、分析
查看了一下driver pod的日志,发现了一个华点
S3ABlockOutputStream不支持使用Syncable API去写日志,打开源码,发现S3ABlockOutputStream实现了Syncable
代码语言:javascript复制class S3ABlockOutputStream extends OutputStream implements
StreamCapabilities, IOStatisticsSource, Syncable, Abortable
Syncable的方法
代码语言:javascript复制@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Syncable {
/** Flush out the data in client's user buffer. After the return of
* this call, new readers will see the data.
* @throws IOException if any error occurs
*/
void hflush() throws IOException;
/** Similar to posix fsync, flush out the data in client's user buffer
* all the way to the disk device (but the disk may have it in its cache).
* @throws IOException if error occurs
*/
void hsync() throws IOException;
}
看下S3ABlockOutputStream对这两个方法的实现,发现调用了一个降级的方法handleSyncableInvocation()
代码语言:javascript复制@Override
public void hflush() throws IOException {
statistics.hflushInvoked();
handleSyncableInvocation();
}
@Override
public void hsync() throws IOException {
statistics.hsyncInvoked();
handleSyncableInvocation();
}
查看handleSyncableInvocation方法
代码语言:javascript复制private void handleSyncableInvocation() {
final UnsupportedOperationException ex
= new UnsupportedOperationException(E_NOT_SYNCABLE);
if (!downgradeSyncableExceptions) {
throw ex;
}
// downgrading.
WARN_ON_SYNCABLE.warn("Application invoked the Syncable API against"
" stream writing to {}. This is unsupported",
key);
// and log at debug
LOG.debug("Downgrading Syncable call", ex);
}
饿。。。凉凉,s3a不支持Syncable的方法刷新(骂骂咧咧,不支持实现个啥啊,哈哈,开个玩笑),具体原因看下官网有详细描述s3aFileSystem:
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/troubleshooting_s3a.html
5、解决方案
最后只能在pod上挂载nfs目录,把日志放到该目录下了
代码语言:javascript复制apiVersion: v1
kind: Service
metadata:
name: spark-history-service
labels:
run: spark-history-service
spec:
ports:
- port: 80
protocol: TCP
name: http
- port: 18080
protocol: TCP
name: spark-history
selector:
run: spark-history-service
---
apiVersion: v1
kind: ConfigMap
metadata:
name: spark-history-defaults
namespace: kyuubi
data:
sparkDefaults: |
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.access.key=XXXXXXXXXXX
spark.hadoop.fs.s3a.secret.key=XXXXXXXXXXX
spark.hadoop.fs.s3a.connection.ssl.enabled=false
spark.hadoop.fs.s3a.endpoint=http://s3.ap-northeast-1.amazonaws.com
spark.hadoop.fs.s3a.path.style.access=true
spark.eventLog.dir=/nfs/sparkOnK8s/eventLogDir
spark.history.fs.logDirectory=/nfs/sparkOnK8s/eventLogDir
spark.history.fs.cleaner.enabled=true
spark.eventLog.compress=true
spark.kubernetes.authenticate.driver.serviceAccountName=default
spark.kubernetes.file.upload.path=s3a://mybucket/sparkOnK8s/kubernetes/file/upload
spark.history.fs.cleaner.enabled=true
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=7d
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: spark-history-service
spec:
selector:
matchLabels:
run: spark-history-service
replicas: 1
template:
metadata:
labels:
run: spark-history-service
spec:
containers:
- name: spark-history-service
image: XXXXXXXX/XXXXXXXXX:spark_history3.2.1_v1.0.0
volumeMounts:
- name: settings
mountPath: /usr/local/spark/conf/spark-defaults.conf
subPath: sparkDefaults
- name: nfs-path
mountPath: /nfs
command: ["/usr/local/spark/bin/spark-class"]
args: ["org.apache.spark.deploy.history.HistoryServer"]
volumes:
- name: settings
configMap:
name: spark-history-defaults
- name: nfs-path
nfs:
path: /home1/nfs
server: 172.XX.XX.XX