Spark on K8S

2021-01-26 14:48:28 浏览数 (1)

Spark on K8S TimeLine

KICKOFF

Spark Standalone on Kubernetes (via k8s community) SPIP: SPARK-18278 https://github.com/apache-spark-on-k8s/spark (fork)

Spark 2.3.0

Officially native Kubernetes support (first release) Experimental Kubernetes 1.7

Spark 2.4.3

Latest release version PySpark/SparkR applications support Client mode support (for interactive applications and notebooks) Support for mounting certain types of Kubernetes volumes

Spark 3.0

SPARK-25826 Kerberos HDFS support Dynamic allocation support

提交运行

image.png

代码语言:javascript复制
bin/spark-submit   
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>   
--deploy-mode cluster   
--name spark-pi   
--class org.apache.spark.examples.SparkPi   
--conf spark.executor.instances=5   
--conf spark.kubernetes.container.image=<spark-image>   
local:///path/to/examples.jar

问题

UI No Logs

Spark on K8S 的Executors页面无logs

出错无法退出

SPARK-27927 driver pod hangs with pyspark 2.4.3 and master on kubernetes SPARK-27812 kubernetes client import non-daemon thread which block jvm exit.

hostPath as LOCAL_DIRS

Spark on k8s默认mount emptyDir这类Volume,实际对应物理机的单盘下的临时路径.

Spark 3.0

hostPath支持 SPARK-27499 Support mapping spark.local.dir to hostPath volume

External shuffle service SPARK-25299 Use remote storage for persisting shuffle data

Dynamic resource allocation SPARK-24432 Add support for dynamic resource allocation SPARK-27963 Allow dynamic allocation without an external shuffle service

访问安全HDFS集群 SPARK-25826 Kerberos HDFS support

SPARK-28949 Kubernetes CGroup leaking leads to Spark Pods hang in Pending status SPARK-28992 Support update dependencies from hdfs when task run on executor pods SPARK-28947 Status logging occurs on every state change but not at an interval for liveness. SPARK-28896 Spark client process is unable to upload jars to hdfs while using ConfigMap not HADOOP_CONF_DIR

0 人点赞