Java版本:
JavaSparkContext sc = ...;
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read().json("hdfs://spark1:9000/students.json");
df.show();
Scala版本:
val sc: SparkContext = ...
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.json("hdfs://spark1:9000/students.json")
df.show()
案例 json数据源
{"id":1, "name":"leo", "age":18}
{"id":2, "name":"jack", "age":19}
{"id":3, "name":"marry", "age":17}
Java版本
代码语言:javascript复制public class DataFrameCreate {
public static void main(String[] args) {
SparkConf conf = newSparkConf().setAppName("DataFrameCreate").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read().json("C:\Users\zhang\Desktop\students.json")
df.show();
}
}
运行到linux集群上面
- 打包 文件路径改成hdfs://spark1:9000/students.json
Sh文件
代码语言:javascript复制spark-submit
--class sql.DataFrameCreate
--num-executors 3
--driver-memory 100m
--executor-memory 100m
--executor-cores 3
--files /usr/local/hive/conf/hive-site.xml
--driver-class-path /usr/local/hive/lib/mysql-connector-java-5.1.17.jar
/sql/worldcount.jar
Scala版本
代码语言:javascript复制object DataFrameCreate {
def main(args: Array[String]){
val conf = new SparkConf().setAppName("DataFrameCreate")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.json("hdfs://spark1:9000/students.json")
df.show()
}
}