Hive 核心服务HiveServer2(HS2)的前世今生,最后提供代码实例

2022-11-08 13:54:48 浏览数 (1)

之前分享了 Hive 元数据的表,一文搞懂 Hive 元数据的表,数仓开发需要熟悉的,建议收藏

今天,来看一下Hive的核心服务之一HiveServer2,简称 HS2。

在了解 HiveServer2 之前我们先来了解一下 HiveServer1(或者称之为 HiveServer)。之后介绍HiveSever2的架构,最后提供几种Shell/Java/Python连接HiveSerevr2的代码。

1

HiveServer1

HiveServer 是一种可选的 Hive 内置服务,可以允许远程客户端使用不同编程语言向 Hive 提交请求并返回结果

HiveServer 是建立在 Apache Thrift 之上的,因此有时会被称为 Thrift Server,这可能会导致我们认知的混乱,因为新服务 HiveServer2 也是建立在 Thrift 之上的。为了避免大家混淆,本文还是叫做HiveServer1 和 HiveServer2。

自从引入 HiveServer2 后,HiveServer 也被称为 HiveServer1。

为什么有了 HiveServer1,还要引入 HiveServer2 ?主要是因为 HiveServer1 有如下的局限性:

1.支持远程客户端连接,但一次只能连接一个客户端,无法处理来自多个客户端的并发请求。这实际上是因为受到 HiveServer 暴露的 Thrift 接口所限制,并且不能通过修改 HiveServer 源代码来解决。

2.没有会话管理的支持。

3.不提供身份验证支持。

注意:从 Hive 1.0.0 版本开始,Hive 发行版中删除了 HiveServer。需要切换到 HiveServer2。‍

2

HiveServer2

HiveSever2 提供了客户端的查询服务,从 Hive 1.0.0 版本开始,HIveServer1 就被删除了,可以直接忽略,企业生产使用的基本都是 HiveSever2。HiveSever2 支持多个客户端并发查询和认证,支持TCP/HTTP。

HiverServer2 实现了一个新的基于 Thrift 的 RPC 接口,该接口可以处理客户端并发请求。支持 Kerberos,LDAP 以及自定义可插拔身份验证。新的 RPC 接口也是 JDBC 和 ODBC 客户端更好的选择,尤其是对于元数据访问。

HiveServer2 也是 Hive 执行引擎的容器。对于每个客户端连接,都会创建一个新的执行上下文,以服务于来自客户端的 Hive SQL 请求。新的 RPC 接口使服务器可以将 Hive 执行上下文与处理客户端请求的线程相关联。

下面是几种常用的客户端使用方式:

beeline 客户端

beeline -u jdbc:hive2://IP地址:端口 -n 用户 -p '密码' 可以进入交互环境,写SQL 进行查询。

beeline 底层对应 java org.apache.hive.cli.beeline.BeeLine 这个类,可以接收的参数列表以及官方示例如下:(代码可以往左滑动)

代码语言:javascript复制
  -u <database url> the JDBC URL to connect to
   -r reconnect to last saved connect url (in conjunction with !save)
   -n <username> the username to connect as
   -p <password> the password to connect as
   -d <driver class> the driver class to use
   -i <init file> script file for initialization
   -e <query> query that should be executed
   -f <exec file> script file that should be executed
   -w (or) --password-file <password file> the password file to read password from
   --hiveconf property=value Use value for given property
   --hivevar name=value hive variable name and value
                                   This is Hive specific settings in which variables
                                   can be set at session level and referenced in Hive
                                   commands or queries.
   --property-file=<property-file> the file to read connection properties (url, driver, user, password) from
   --color=[true/false] control whether color is used for display
   --showHeader=[true/false] show column names in query results
   --headerInterval=ROWS; the interval between which heades are displayed
   --fastConnect=[true/false] skip building table/column list for tab-completion
   --autoCommit=[true/false] enable/disable automatic transaction commit
   --verbose=[true/false] show verbose error messages and debug info
   --showWarnings=[true/false] display connection warnings
   --showDbInPrompt=[true/false] display the current database name in the prompt
   --showNestedErrs=[true/false] display nested errors
   --numberFormat=[pattern] format numbers using DecimalFormat pattern
   --force=[true/false] continue running script even after errors
   --maxWidth=MAXWIDTH the maximum width of the terminal
   --maxColumnWidth=MAXCOLWIDTH the maximum width to use when displaying columns
   --silent=[true/false] be more silent
   --autosave=[true/false] automatically save preferences
   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv] format mode for result display
                                   Note that csv, and tsv are deprecated - use csv2, tsv2 instead
   --incremental=[true/false] Defaults to false. When set to false, the entire result set
                                   is fetched and buffered before being displayed, yielding optimal
                                   display column sizing. When set to true, result rows are displayed
                                   immediately as they are fetched, yielding lower latency and
                                   memory usage at the price of extra display column padding.
                                   Setting --incremental=true is recommended if you encounter an OutOfMemory
                                   on the client side (due to the fetched result set size being large).
                                   Only applicable if --outputformat=table.
   --incrementalBufferRows=NUMROWS the number of rows to buffer when printing rows on stdout,
                                   defaults to 1000; only applicable if --incremental=true
                                   and --outputformat=table
   --truncateTable=[true/false] truncate table column when it exceeds length
   --delimiterForDSV=DELIMITER specify the delimiter for delimiter-separated values output format (default: |)
   --isolation=LEVEL set the transaction isolation level
   --nullemptystring=[true/false] set to true to get historic behavior of printing null as empty string
   --showConnectedUrl=[true/false] Prompt HiveServer2s URI to which this beeline connected.
                                   Only works for HiveServer2 cluster mode.
   --maxHistoryRows=MAXHISTORYROWS The maximum number of rows to store beeline history.
   --convertBinaryArrayToString=[true/false] display binary column data as string or as byte array 
   --help display this message
 
   Example:
    1. Connect using simple authentication to HiveServer2 on localhost:10000
    $ beeline -u jdbc:hive2://localhost:10000 username password

    2. Connect using simple authentication to HiveServer2 on hs.local:10000 using -n for username and -p for password
    $ beeline -n username -p password -u jdbc:hive2://hs2.local:10012

    3. Connect using Kerberos authentication with hive/localhost@mydomain.com as HiveServer2 principal
    $ beeline -u "jdbc:hive2://hs2.local:10013/default;principal=hive/localhost@mydomain.com

    4. Connect using SSL connection to HiveServer2 on localhost at 10000
    $ beeline jdbc:hive2://localhost:10000/default;ssl=true;sslTrustStore=/usr/local/truststore;trustStorePassword=mytruststorepassword

    5. Connect using LDAP authentication
    $ beeline -u jdbc:hive2://hs2.local:10013/default <ldap-username> <ldap-password>

Java客户端

新增Mavan依赖

代码语言:javascript复制
<dependencies>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>2.1.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.3</version>
        </dependency>
</dependencies>

代码:

代码语言:javascript复制
import java.sql.*;

public class HiveTest {
    private static String driverName =
            "org.apache.hive.jdbc.HiveDriver";

    public static void main(String[] args)
            throws SQLException {
        try{
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
            System.exit(1);
        }

        Connection con = DriverManager.getConnection(
                "jdbc:hive2://地址:端口/default", "用户", "密码");
        Statement stmt = con.createStatement();
        String tableName = "HiveTestByJava";
        stmt.execute("drop table if exists "   tableName);
        stmt.execute("create table "   tableName  
                " (key int, value string)");
        System.out.println("Create table success!");
        // show tables
        String sql = "show tables '"   tableName   "'";
        System.out.println("Running: "   sql);
        ResultSet res = stmt.executeQuery(sql);
        if (res.next()) {
            System.out.println(res.getString(1));
        }

        // describe table
        sql = "describe "   tableName;
        System.out.println("Running: "   sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(res.getString(1)   "t"   res.getString(2));
        }

        sql = "insert into "   tableName   " values (42,"hello"),(48,"world")";
        stmt.execute(sql);

        sql = "select * from "   tableName;
        System.out.println("Running: "   sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(String.valueOf(res.getInt(1))   "t"
                      res.getString(2));
        }

        sql = "select count(1) from "   tableName;
        System.out.println("Running: "   sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(res.getString(1));
        }
    }
}

Python代码

安装依赖:

pip install pyhs2

代码:

代码语言:javascript复制
import pyhs2
import sys

default_encoding = 'utf-8'

conn = pyhs2.connect(host='地址',
                                  port=端口,
                                  authMechanism='PLAIN',
                                  user='账户',
                                  password='密码',
                                  database='default',)


tablename = 'HiveByPython'
cur = conn.cursor()
print 'show the databases: '
print cur.getDatabases()

print "n"
print 'show the tables in default: '
cur.execute('show tables')
for i in cur.fetch():
        print i

cur.execute('drop table if exists '   tablename)
cur.execute('create table '   tablename   ' (key int,value string)')

print "n"
print 'show the new table: '
cur.execute('show tables '  "'"  tablename "'")
for i in cur.fetch():
        print i

print "n"
print "contents from "   tablename   ":";
cur.execute('insert into '   tablename   ' values (42,"hello"),(48,"world")')
cur.execute('select * from '   tablename)
for i in cur.fetch():
        print i

建议收藏备忘,接下来我会分享更多实用的内容。

0 人点赞