在文章中,我们说到Hive 3.0.0版本开始,其单独提供了standalone metastore服务以作为像presto等处理引擎的元数据管理中心。
本文以Java API为例,介绍如何获取hive standalone metastore中的catalog、database、table等信息。
当然,首先要在maven项目中导入如下依赖(以hive 3.1.2为例)
代码语言:javascript复制 <dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-standalone-metastore</artifactId>
<version>3.1.2</version>
</dependency>Copy
接着便可以通过如下方式建立客户端IMetaStoreClient与HMS进行连接
代码语言:javascript复制 /**
* 初始化HMS连接
* @param conf org.apache.hadoop.conf.Configuration HMS连接信息
* @return IMetaStoreClient
* @throws MetaException 异常
*/
public static IMetaStoreClient init(Configuration conf) throws MetaException {
try {
return RetryingMetaStoreClient.getProxy(conf, false);
} catch (MetaException e) {
LOGGER.error("hms连接失败", e);
throw e;
}
}Copy
而HMS的连接信息有两种方式可以提供,一种是通过配置文件hive-site.xml的形式,另一种则是指定"hive.metastore.uris"参数,具体如下所示:
代码语言:javascript复制 Configuration conf = new Configuration();
// 通过"hive.metastore.uris"参数提供HMS连接信息
conf.set("hive.metastore.uris", "thrift://192.168.1.3:9083");
// 通过hive-site.xml方式提供HMS连接信息
// conf.addResource("hive-site.xml");
IMetaStoreClient client = HMSClient.init(conf);
Copy
通过上述方式建立与HMS连接的客户端之后,便可以通过下述接口获取catalog等信息
代码语言:javascript复制 System.out.println("----------------------------获取所有catalogs-------------------------------------");
client.getCatalogs().forEach(System.out::println);
System.out.println("------------------------获取catalog为hive的描述信息--------------------------------");
System.out.println(client.getCatalog("hive").toString());
System.out.println("--------------------获取catalog为hive的所有database-------------------------------");
client.getAllDatabases("hive").forEach(System.out::println);
System.out.println("---------------获取catalog为hive,database为hive的描述信息--------------------------");
System.out.println(client.getDatabase("hive", "hive_storage"));
System.out.println("-----------获取catalog为hive,database名为hive_storage下的所有表--------------------");
client.getTables("hive", "hive_storage", "*").forEach(System.out::println);
System.out.println("------获取catalog为hive,database名为hive_storage,表名为sample_table_1的描述信息-----");
System.out.println(client.getTable("hive", "hive_storage", "sample_table_1").toString());Copy
如果要了解更多使用方法,可参考HiveMetaStoreClient.java类
下面为具体代码实现:
maven项目的pom.xml文件
代码语言:javascript复制<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.zh.ch.bigdata.hms</groupId>
<artifactId>hms-client</artifactId>
<version>1.0-SNAPSHOT</version>
<name>hms-client</name>
<!-- FIXME change it to the project's website -->
<url>http://www.example.com</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-standalone-metastore</artifactId>
<version>3.1.2</version>
</dependency>
</dependencies>
<build>
<pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
<plugins>
<!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.1</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>3.0.0</version>
</plugin>
</plugins>
</pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
Copy
HMSClient.java测试代码
代码语言:javascript复制package com.zh.ch.bigdata.hms;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hive.metastore.IMetaStoreClient;
import org.apache.hadoop.hive.metastore.RetryingMetaStoreClient;
import org.apache.hadoop.hive.metastore.api.MetaException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class HMSClient {
public static final Logger LOGGER = LoggerFactory.getLogger(HMSClient.class);
/**
* 初始化HMS连接
* @param conf org.apache.hadoop.conf.Configuration
* @return IMetaStoreClient
* @throws MetaException 异常
*/
public static IMetaStoreClient init(Configuration conf) throws MetaException {
try {
return RetryingMetaStoreClient.getProxy(conf, false);
} catch (MetaException e) {
LOGGER.error("hms连接失败", e);
throw e;
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set("hive.metastore.uris", "thrift://192.168.1.3:9083");
// conf.addResource("hive-site.xml");
IMetaStoreClient client = HMSClient.init(conf);
System.out.println("----------------------------获取所有catalogs-------------------------------------");
client.getCatalogs().forEach(System.out::println);
System.out.println("------------------------获取catalog为hive的描述信息--------------------------------");
System.out.println(client.getCatalog("hive").toString());
System.out.println("--------------------获取catalog为hive的所有database-------------------------------");
client.getAllDatabases("hive").forEach(System.out::println);
System.out.println("---------------获取catalog为hive,database为hive的描述信息--------------------------");
System.out.println(client.getDatabase("hive", "hive_storage"));
System.out.println("-----------获取catalog为hive,database名为hive_storage下的所有表--------------------");
client.getTables("hive", "hive_storage", "*").forEach(System.out::println);
System.out.println("------获取catalog为hive,database名为hive_storage,表名为sample_table_1的描述信息-----");
System.out.println(client.getTable("hive", "hive_storage", "sample_table_1").toString());
}
}
Copy
运行结果
代码语言:javascript复制----------------------------获取所有catalogs-------------------------------------
hive
------------------------获取catalog为hive的描述信息--------------------------------
Catalog(name:hive, description:Default catalog for Hive, locationUri:file:/user/hive/warehouse)
--------------------获取catalog为hive的所有database-------------------------------
default
hive
hive_storage
---------------获取catalog为hive,database为hive的描述信息--------------------------
Database(name:hive_storage, description:null, locationUri:s3a://hive-storage/, parameters:{}, ownerName:root, ownerType:USER, catalogName:hive)
-----------获取catalog为hive,database名为hive_storage下的所有表--------------------
sample_table_1
------获取catalog为hive,database名为hive_storage,表名为sample_table_1的描述信息-----
Table(tableName:sample_table_1, dbName:hive_storage, owner:root, createTime:1641540923, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:null), FieldSchema(name:col2, type:string, comment:null)], location:s3a://hive-storage/sample_table_1, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:sample_table_1, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{presto_query_id=20220107_073521_00018_favj9, totalSize=366, numRows=1, rawDataSize=22, COLUMN_STATS_ACCURATE={"COLUMN_STATS":{"col1":"true","col2":"true"}}, numFiles=1, transient_lastDdlTime=1641540923, auto.purge=false, STATS_GENERATED_VIA_STATS_TASK=workaround for potential lack of HIVE-12730, presto_version=366}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER)
Copy
本文为从大数据到人工智能博主「xiaozhch5」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://cloud.tencent.com/developer/article/1936569