Spark 覆盖写Hive分区表,只覆盖部分对应分区

2020-08-11 15:51:19 浏览数 (1)

要求Spark版本2.3以上,亲测2.2无效

配置 config("spark.sql.sources.partitionOverwriteMode","dynamic") 注意 1、saveAsTable方法无效,会全表覆盖写,需要用insertInto,详情见代码 2、insertInto需要主要DataFrame列的顺序要和Hive表里的顺序一致,不然会数据错误!

package com.dkl.blog.spark.hive

import org.apache.spark.sql.SparkSession

/**

  • Created by dongkelun on 2020/1/16 15:25
  • 博客:Spark 覆盖写Hive分区表,只覆盖部分对应分区
  • 要求Spark版本2.3以上 */ object SparkHivePartitionOverwrite { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .appName("SparkHivePartitionOverwrite") .master("local") .config("spark.sql.parquet.writeLegacyFormat", true) .config("spark.sql.sources.partitionOverwriteMode","dynamic") .enableHiveSupport() .getOrCreate() import spark.sql val data = Array(("001", "张三", 21, "2018"), ("002", "李四", 18, "2017")) val df = spark.createDataFrame(data).toDF("id", "name", "age", "year") //创建临时表 df.createOrReplaceTempView("temp_table") val tableName="test_partition" //切换hive的数据库 sql("use test") // 1、创建分区表,并写入数据 df.write.mode("overwrite").partitionBy("year").saveAsTable(tableName) spark.table(tableName).show() val data1 = Array(("011", "Sam", 21, "2018")) val df1 = spark.createDataFrame(data1).toDF("id", "name", "age", "year") // df1.write.mode("overwrite").partitionBy("year").saveAsTable(tableName) //不成功,全表覆盖 // df1.write.mode("overwrite").format("Hive").partitionBy("year").saveAsTable(tableName) //不成功,全表覆盖 df1.write.mode("overwrite").insertInto(tableName) spark.table(tableName).show() spark.stop }

} 结果 --- ---- --- ---- | id|name|age|year| --- ---- --- ---- |002| 李四| 18|2017| |001| 张三| 21|2018| --- ---- --- ----

--- ---- --- ---- | id|name|age|year| --- ---- --- ---- |011| Sam| 21|2018| --- ---- --- ----

0 人点赞