通过Java API获取Hive Metastore中的元数据信息

2022-01-19 09:28:18 浏览数 (1)

在文章中,我们说到Hive 3.0.0版本开始,其单独提供了standalone metastore服务以作为像presto等处理引擎的元数据管理中心。

本文以Java API为例,介绍如何获取hive standalone metastore中的catalog、database、table等信息。

当然,首先要在maven项目中导入如下依赖(以hive 3.1.2为例)

代码语言:javascript复制
    <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-standalone-metastore</artifactId>
      <version>3.1.2</version>
    </dependency>Copy

接着便可以通过如下方式建立客户端IMetaStoreClient与HMS进行连接

代码语言:javascript复制
    /**
     * 初始化HMS连接
     * @param conf org.apache.hadoop.conf.Configuration HMS连接信息
     * @return IMetaStoreClient
     * @throws MetaException 异常
     */
    public static IMetaStoreClient init(Configuration conf) throws MetaException {
        try {
            return RetryingMetaStoreClient.getProxy(conf, false);
        } catch (MetaException e) {
            LOGGER.error("hms连接失败", e);
            throw e;
        }
    }Copy

而HMS的连接信息有两种方式可以提供,一种是通过配置文件hive-site.xml的形式,另一种则是指定"hive.metastore.uris"参数,具体如下所示:

代码语言:javascript复制
        Configuration conf = new Configuration();
        // 通过"hive.metastore.uris"参数提供HMS连接信息
        conf.set("hive.metastore.uris", "thrift://192.168.1.3:9083");    

        // 通过hive-site.xml方式提供HMS连接信息
        // conf.addResource("hive-site.xml");
        IMetaStoreClient client = HMSClient.init(conf);
Copy

通过上述方式建立与HMS连接的客户端之后,便可以通过下述接口获取catalog等信息

代码语言:javascript复制
        System.out.println("----------------------------获取所有catalogs-------------------------------------");
        client.getCatalogs().forEach(System.out::println);

        System.out.println("------------------------获取catalog为hive的描述信息--------------------------------");
        System.out.println(client.getCatalog("hive").toString());

        System.out.println("--------------------获取catalog为hive的所有database-------------------------------");
        client.getAllDatabases("hive").forEach(System.out::println);

        System.out.println("---------------获取catalog为hive,database为hive的描述信息--------------------------");
        System.out.println(client.getDatabase("hive", "hive_storage"));

        System.out.println("-----------获取catalog为hive,database名为hive_storage下的所有表--------------------");
        client.getTables("hive", "hive_storage", "*").forEach(System.out::println);

        System.out.println("------获取catalog为hive,database名为hive_storage,表名为sample_table_1的描述信息-----");
        System.out.println(client.getTable("hive", "hive_storage", "sample_table_1").toString());Copy

如果要了解更多使用方法,可参考HiveMetaStoreClient.java类

下面为具体代码实现:

maven项目的pom.xml文件

代码语言:javascript复制
<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.zh.ch.bigdata.hms</groupId>
  <artifactId>hms-client</artifactId>
  <version>1.0-SNAPSHOT</version>

  <name>hms-client</name>
  <!-- FIXME change it to the project's website -->
  <url>http://www.example.com</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.7</maven.compiler.source>
    <maven.compiler.target>1.7</maven.compiler.target>
  </properties>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-standalone-metastore</artifactId>
      <version>3.1.2</version>
    </dependency>
  </dependencies>

  <build>
    <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
      <plugins>
        <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
        <plugin>
          <artifactId>maven-clean-plugin</artifactId>
          <version>3.1.0</version>
        </plugin>
        <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
        <plugin>
          <artifactId>maven-resources-plugin</artifactId>
          <version>3.0.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-compiler-plugin</artifactId>
          <version>3.8.0</version>
        </plugin>
        <plugin>
          <artifactId>maven-surefire-plugin</artifactId>
          <version>2.22.1</version>
        </plugin>
        <plugin>
          <artifactId>maven-jar-plugin</artifactId>
          <version>3.0.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-install-plugin</artifactId>
          <version>2.5.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-deploy-plugin</artifactId>
          <version>2.8.2</version>
        </plugin>
        <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
        <plugin>
          <artifactId>maven-site-plugin</artifactId>
          <version>3.7.1</version>
        </plugin>
        <plugin>
          <artifactId>maven-project-info-reports-plugin</artifactId>
          <version>3.0.0</version>
        </plugin>
      </plugins>
    </pluginManagement>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>8</source>
          <target>8</target>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>
Copy

HMSClient.java测试代码

代码语言:javascript复制
package com.zh.ch.bigdata.hms;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hive.metastore.IMetaStoreClient;
import org.apache.hadoop.hive.metastore.RetryingMetaStoreClient;
import org.apache.hadoop.hive.metastore.api.MetaException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class HMSClient {

    public static final Logger LOGGER = LoggerFactory.getLogger(HMSClient.class);

    /**
     * 初始化HMS连接
     * @param conf org.apache.hadoop.conf.Configuration
     * @return IMetaStoreClient
     * @throws MetaException 异常
     */
    public static IMetaStoreClient init(Configuration conf) throws MetaException {
        try {
            return RetryingMetaStoreClient.getProxy(conf, false);
        } catch (MetaException e) {
            LOGGER.error("hms连接失败", e);
            throw e;
        }
    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();
        conf.set("hive.metastore.uris", "thrift://192.168.1.3:9083");

        // conf.addResource("hive-site.xml");
        IMetaStoreClient client = HMSClient.init(conf);

        System.out.println("----------------------------获取所有catalogs-------------------------------------");
        client.getCatalogs().forEach(System.out::println);

        System.out.println("------------------------获取catalog为hive的描述信息--------------------------------");
        System.out.println(client.getCatalog("hive").toString());

        System.out.println("--------------------获取catalog为hive的所有database-------------------------------");
        client.getAllDatabases("hive").forEach(System.out::println);

        System.out.println("---------------获取catalog为hive,database为hive的描述信息--------------------------");
        System.out.println(client.getDatabase("hive", "hive_storage"));

        System.out.println("-----------获取catalog为hive,database名为hive_storage下的所有表--------------------");
        client.getTables("hive", "hive_storage", "*").forEach(System.out::println);

        System.out.println("------获取catalog为hive,database名为hive_storage,表名为sample_table_1的描述信息-----");
        System.out.println(client.getTable("hive", "hive_storage", "sample_table_1").toString());
    }
}
Copy

运行结果

代码语言:javascript复制
----------------------------获取所有catalogs-------------------------------------
hive
------------------------获取catalog为hive的描述信息--------------------------------
Catalog(name:hive, description:Default catalog for Hive, locationUri:file:/user/hive/warehouse)
--------------------获取catalog为hive的所有database-------------------------------
default
hive
hive_storage
---------------获取catalog为hive,database为hive的描述信息--------------------------
Database(name:hive_storage, description:null, locationUri:s3a://hive-storage/, parameters:{}, ownerName:root, ownerType:USER, catalogName:hive)
-----------获取catalog为hive,database名为hive_storage下的所有表--------------------
sample_table_1
------获取catalog为hive,database名为hive_storage,表名为sample_table_1的描述信息-----
Table(tableName:sample_table_1, dbName:hive_storage, owner:root, createTime:1641540923, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:null), FieldSchema(name:col2, type:string, comment:null)], location:s3a://hive-storage/sample_table_1, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:sample_table_1, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{presto_query_id=20220107_073521_00018_favj9, totalSize=366, numRows=1, rawDataSize=22, COLUMN_STATS_ACCURATE={"COLUMN_STATS":{"col1":"true","col2":"true"}}, numFiles=1, transient_lastDdlTime=1641540923, auto.purge=false, STATS_GENERATED_VIA_STATS_TASK=workaround for potential lack of HIVE-12730, presto_version=366}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER)  
Copy

本文为从大数据到人工智能博主「xiaozhch5」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。

原文链接:https://cloud.tencent.com/developer/article/1936569

0 人点赞