温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。
Fayson的github: https://github.com/fayson/cdhproject
提示:代码块部分可以左右滑动查看噢
1.文档编写目的
前面Fayson介绍了《如何在CDH中启用Spark Thrift》和《如何在Kerberos环境下的CDH集群部署Spark1.6 Thrift及spark-sql客户端》,本篇文章Fayson主要介绍如何使用Java JDBC连接非Kerberos和Kerberos环境下Spark ThriftServer服务。
- 内容概述
1.环境准备
2.非Kerberos及Kerberos环境连接示例
- 测试环境
1.Kerberos和非Kerberos集群CDH5.12.1,OS为Redhat7.2
- 前置条件
1.Spark1.6的ThriftServer服务正常
2.环境准备
1.创建Java工程jdbcdemo
2.添加Maven依赖
代码语言:javascript复制 <dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.5</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.1.0</version>
</dependency>
</dependencies>
(可左右滑动)
3.非Kerberos环境示例
1.启动非Kerberos环境下的Spark ThriftServer服务
代码语言:javascript复制[root@cdh04 ~]# cd /opt/cloudera/parcels/CDH/lib/spark/sbin/
[root@cdh04 sbin]# ./stop-thriftserver.sh
[root@cdh04 sbin]# rm -rf ../logs/*
[root@cdh04 sbin]# export HADOOP_USER_NAME=hive
[root@cdh04 sbin]# ./start-thriftserver.sh --hiveconf hive.server2.thrift.port=10001
> --hiveconf hive.server2.thrift.bind.host=0.0.0.0
> --hiveconf hive.server2.enable.doAs=true
(可左右滑动)
2.在工程目录下新建NoneKBSample.java文件,内容如下:
代码语言:javascript复制package com.cloudera.spark1jdbc;
import com.cloudera.utils.JDBCUtils;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
/**
* package: com.cloudera.sparkjdbc
* describe: 使用JDBC的方式访问非Kerberos环境下Spark1.6 Thrift Server
* creat_user: Fayson
* email: htechinfo@163.com
* creat_date: 2018/6/1
* creat_time: 上午10:21
* 公众号:Hadoop实操
*/
public class NoneKBSample {
private static String JDBC_DRIVER = "org.apache.hive.jdbc.HiveDriver";
private static String CONNECTION_URL ="jdbc:hive2://cdh04.fayson.com:10001/";
static {
try {
Class.forName(JDBC_DRIVER);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
System.out.println("使用JDBC的方式访问非Kerberos环境下Spark1.6 Thrift Server");
Connection connection = null;
ResultSet rs = null;
PreparedStatement ps = null;
try {
connection = DriverManager.getConnection(CONNECTION_URL, "hive", "");
ps = connection.prepareStatement("select * from test");
rs = ps.executeQuery();
while (rs.next()) {
System.out.println(rs.getInt(1) "-------" rs.getString(2));
}
} catch (Exception e) {
e.printStackTrace();
} finally {
JDBCUtils.disconnect(connection, rs, ps);
}
}
}
(可左右滑动)
3.示例代码运行
4.Kerberos环境示例
连接Kerberos环境下的Spark1.6 ThriftServer需要准备krb5.conf文件及keytab文件。
1.将CDH集群中的krb5.conf配置文件拷贝至本地开发环境
代码语言:javascript复制# Configuration snippets may be placed in this directory as well
includedir /etc/krb5.conf.d/
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
dns_lookup_realm = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
rdns = false
default_realm = FAYSON.COM
#default_ccache_name = KEYRING:persistent:%{uid}
[realms]
FAYSON.COM = {
kdc = cdh01.fayson.com
admin_server = cdh01.fayson.com
}
[domain_realm]
.fayson.com = FAYSON.COM
fayson.com = FAYSON.COM
(可左右滑动)
在集群KDC和Kadmin所在服务,导出一个用于连接Spark ThriftServer服务的keytab文件
代码语言:javascript复制[root@cdh01 ~]# kadmin.local
kadmin.local: xst -norandkey -k fayson.keytab fayson@FAYSON.COM
kadmin.local: exit
(可左右滑动)
检查导出的fayson.keytab文件
2.启动Spark1.6的ThriftServer服务
代码语言:javascript复制./start-thriftserver.sh --hiveconf hive.server2.authentication.kerberos.principal=hive/cdh04.fayson.com@FAYSON.COM
--hiveconf hive.server2.authentication.kerberos.keytab=hive.keytab
--principal hive/cdh04.fayson.com@FAYSON.COM --keytab hive.keytab
--hiveconf hive.server2.thrift.port=10001
--hiveconf hive.server2.thrift.bind.host=0.0.0.0
--hiveconf hive.server2.enable.doAs=true
(可左右滑动)
这里在cdh04.fayson.com启动的ThriftServer,使用hive/cdh04.fayson.com@FAYSON.COM账号启动,在下面的JDBC连接时需要该账号。
3.在工程目录下新建KBSample.java文件,内容如下:
代码语言:javascript复制package com.cloudera.spark1jdbc;
import com.cloudera.utils.JDBCUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.security.UserGroupInformation;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
/**
* package: com.cloudera.sparkjdbc
* describe: 使用JDBC的方式访问Kerberos环境下Spark1.6 Thrift Server
* creat_user: Fayson
* email: htechinfo@163.com
* creat_date: 2018/6/1
* creat_time: 上午10:21
* 公众号:Hadoop实操
*/
public class KBSample {
private static String JDBC_DRIVER = "org.apache.hive.jdbc.HiveDriver";
private static String CONNECTION_URL ="jdbc:hive2://cdh04.fayson.com:10001/;principal=hive/cdh04.fayson.com@FAYSON.COM";
static {
try {
Class.forName(JDBC_DRIVER);
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
System.out.println("通过JDBC连接Kerberos环境下的Spark1.6 Thrift Server");
//登录Kerberos账号
System.setProperty("java.security.krb5.conf", "/Users/fayson/Documents/develop/kerberos/krb5.conf");
Configuration configuration = new Configuration();
configuration.set("hadoop.security.authentication" , "Kerberos" );
UserGroupInformation. setConfiguration(configuration);
UserGroupInformation.loginUserFromKeytab("fayson@FAYSON.COM", "/Users/fayson/Documents/develop/kerberos/fayson.keytab");
System.out.println(UserGroupInformation.getLoginUser());
Connection connection = null;
ResultSet rs = null;
PreparedStatement ps = null;
try {
connection = DriverManager.getConnection(CONNECTION_URL);
ps = connection.prepareStatement("select * from test");
rs = ps.executeQuery();
while (rs.next()) {
System.out.println(rs.getInt(1) "----" rs.getString(2));
}
} catch (Exception e) {
e.printStackTrace();
} finally {
JDBCUtils.disconnect(connection, rs, ps);
}
}
}
(可左右滑动)
4.示例代码运行
成功的从Hive库中取出test表的数据。
5.查看Yarn上的作业
Spark执行的SQL语句
5.总结
- 通过JDBC访问Spark ThriftServer使用Hive JDBC驱动即可,不需要做额外的配置
- 在启用非Kerberos环境下的Spark ThriftServer服务时需要指定用户为hive,否则在执行查询的时候会出现访问HDFS文件权限问题
- 访问Kerberos环境下的Spark ThriftServer需要在运行环境中增加Kerberos的环境